摘要
针对卷积神经网络(CNN)在资源受限的硬件设备上运行功耗高及运行慢的问题,提出一种基于现场可编程门阵列(FPGA)的CNN定点计算加速方法。首先提出一种定点化方法,并且每层卷积设计不同的尺度参数,使用相对散度确定位宽的长度,以减小CNN参数的存储空间,而且研究不同量化区间对CNN精度的影响;其次,设计参数复用方法及流水线计算方法来加速卷积计算。为验证CNN定点化后的加速效果,采用了人脸和船舶两个数据集进行验证。结果表明,相较于传统的浮点卷积计算,所提方法在保证CNN精度损失很小的前提下,当权值参数和输入特征图参数量化到7-bit时,在人脸识别CNN模型上的压缩后的权重参数文件大小约为原来的22%,卷积计算加速比为18.69,同时使FPGA中的乘加器的利用率达94.5%。实验结果表明了该方法可以提高卷积计算速度,并且能够高效利用FPGA硬件资源。
Aiming at the problem of high running power consumption and slow operation of Convolutional Neural Network(CNN)on resource-constrained hardware devices,a method for accelerating fixed-point computation of CNN based on Field Programmable Gate Array(FPGA)was proposed.First,a fixed-point processing method was proposed.In order to reduce the storage space of the CNN parameters,different scale parameters were designed for different convolution layers and the relative divergence was used to determine the bit width length.The effect of different quantization intervals on the accuracy of CNN was studied.Then,the parameter multiplexing method and the pipeline calculation method were designed to accelerate the convolution calculation.In order to verify the acceleration effect of CNN after fixed-point processing,two datasets of face and ship were used for verification.Compared with the traditional floating-point convolution computation,on the premise of ensuring that the accuracy loss of the CNN is small,when the weight parameters and the input feature map parameters are quantized to 7-bit,on the face recognition CNN model,the proposed method has the compressed weight parameter file size of about 22%of the origin,and the convolution calculation speedup is 18.69.At the same time,the method makes the utilization rate of the multiplier-accumulator in FPGA reach 94.5%.Experimental results show that the proposed method can improve the speed of convolution calculation,and efficiently use FPGA hardware resources.
作者
雷小康
尹志刚
赵瑞莲
LEI Xiaokang;YIN Zhigang;ZHAO Ruilian(School of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
出处
《计算机应用》
CSCD
北大核心
2020年第10期2811-2816,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61672085)。
关键词
卷积神经网络
定点量化
现场可编程门阵列
模型压缩
YOLO模型
Convolutional Neural Network(CNN)
fixed-point quantization
Field Programmable Gate Array(FPGA)
model compression
YOLO model