摘要
近年来,随着人工智能技术的发展,卷积神经网络(CNN)作为深度学习技术中的常用算法,在计算机视觉、语音识别及自然语言处理等诸多领域得到了广泛的应用。可编程门阵列(FPGA)因其高并行度和高灵活性等优势常被用于CNN的加速。基于此,本文对高性能CNN加速器的设计进行研究。文中采用DSP的级联、卷积核数据的“乒-乓”结构,以及多通道并行、特征图及卷积核数据的复用等方法,以期在资源受限的FPGA平台中为CNN的计算提供高性能加速。实验结果显示,本文的设计方法使用了较少的LUT资源,在Virtex7 VX690T上的峰值运算性能达到1.6TOPs,对VGG16网络加速时吞吐量达到1.334TOPs,具有较高的计算性能和较少的资源消耗。
Recently,with the development of the technology of artificial intelligence,convolution neural network,as a common algorithm in deep learning technology,has been widely used in some domains,such as computer vision,speech recognition and nature language processing. And field programmable gate array(FPGA) is often used in CNN accelerator,due to its high degree of parallelism and high flexibility and other advantages. Based on this, this paper studied the design of the high performance CNN accelerator based on FPGA. This paper used DSP cascading,convolution kernel ping-pong,multichannel parallel computing,feature map and convolution kernel multiplexing,and other technologies,in order to provide high performance acceleration for CNN computing in resource constrained FPGA platform.The test results showed that the design method in this paper reduced the number of LUT used. On the Virtex7 VX690T FPGA platform,the CNN accelerator can achieve a peak performance of 1.6TOPs,and a throughput of 1.334TOPs for VGG16 networks. It has better computing performance and less resource consumption.
作者
曹学成
廖湘萍
李盈盈
丁永林
李炜
CAO Xuecheng;LIAO Xiangping;LI Yingying;DING Yonglin;LI Wei(China Electronics Technology Group Corporation 52nd Research Institute,Hangzhou 311100,China)
出处
《智能物联技术》
2021年第5期11-17,共7页
Technology of Io T& AI