摘要
计算机视觉的快速发展对嵌入式产品的系统性能要求越来越高,传统的现场可编程门阵列(Field Programmable Gate Array,FPGA)平台存在计算吞吐未能很好匹配内存带宽,通用处理器对卷积神经网络(Convolutional Neural Network,CNN)的实现效率不高,未能满足性能要求等问题。针对以上设计瓶颈,使用经典的LeNet-5神经网络模型,在Xilinx ZC706嵌入式开发平台上设计了一个高性能的人脸识别神经网络加速器,在高层次综合(High Level Synthesis,HLS)工具的基础上通过存储优化、定点量化、运算优化等方法对神经网络模型进行优化改进,实现了7层的CNN加速器。实验结果表明,CNN加速器的工作频率为200 MHz,相较于CPU,加速器实现了126倍加速,相较于GPU速度提升10倍以上,并且功耗仅为2.62 W。
The rapid development of computer vision requires higher and higher system performance of embedded products,traditional Field Programmable Gate Array(FPGA)platform has some problems that computational throughput does not match the memory bandwidth well,the implementation efficiency of general processor pair Convolutional Neural Network(CNN)is not high,and the performance requirements are not met.Aiming at above design bottlenecks,using the classic LeNet-5 neural network model,a high-performance face recognition neural network accelerator is designed on the Xilinx ZC706 embedded development platform,which is optimized by storage based on High Level Synthesis(HLS)tools.The fixed-point quantization,computational optimization and other aspects of the neural network model are optimized and improved,and the 7-layer CNN accelerator is realized.Experimental results show that the operating frequency of CNN accelerator is 200 MHz.Compared with the CPU,the accelerator achieves 126 times acceleration,which is more than ten times faster than the GPU speed,and the power consumption is only 2.62 W.
作者
吴进
张伟华
席萌
代巍
WU Jin;ZHANG Weihua;XI Meng;DAI Wei(School of Electronic Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第22期48-54,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61834005,No.61772417,No.61602377,No.61634004)
陕西省重点研发计划项目(No.2017GY-060)
陕西省自然科学基础研究计划项目(No.2018JM4018)。