摘要
为了提高中小规模设备卷积神经网络的推理速度,提出一种基于FPGA的卷积神经网络硬件加速器设计方案。针对模型中的卷积运算单元,该硬件加速器采用输入、输出二维循环展开和循环分块的方法,设计128个并行乘法器单元。模型的输入输出接口采用双缓存设计,通过乒乓操作,降低数据传输带来的时间延迟。同时,采用16位定点量化模型中权重参数,偏置参数和输入输出特征图的像素值。实验结果表明,与通用CPU酷睿i5-4440处理器相比,在COCO数据集上准确率几乎不变的情况下,计算性能提高5.77倍。在系统时钟频率为150 MHz时,硬件加速器的计算性能达到28.88 GOPS。
In order to improve the reasoning speed of the convolutional neural network of small and medium-sized equipment,this paper proposes a design scheme of FPGA-based convolutional neural network hardware accelerator.Aiming at the convolution operation unit in the model,the hardware accelerator adopted the input and output two-dimensional loop unrolling and loop tiling method,and designed 128 parallel multiplier units.The input and output interface of the model used a double buffer design to reduce the time delay caused by data transmission through ping-pong operation.The 16-bit fixed-point was used to quantify weight parameters,bias parameters and pixel values of the input and output feature maps in the model.The experimental results show that compared with the general-purpose CPU Core i5-4440 processor,the calculation performance is improved by 5.77 times while the accuracy rate on the COCO data set is almost unchanged.When the system clock frequency is 150 MHz,the computing performance of the hardware accelerator reaches 28.88 GOPS.
作者
黄沛昱
赵强
李煜龙
Huang Peiyu;Zhao Qiang;Li Yulong(College of Photoelectric Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
出处
《计算机应用与软件》
北大核心
2023年第3期38-44,共7页
Computer Applications and Software
基金
国家自然科学基金项目(61801061)
重庆市教委科学技术研究项目(KJQN201800607)。