摘要
卷积神经网络(Convolutional Neural Network,CNN)在各种计算机视觉应用中取得了巨大成功.本文研究了卷积神经网络的并行结构,基于网络计算的多种并行特征,提出了CNN前向传播过程在FPGA并行计算的架构.实验结果表明,在110MHz的工作频率下,该结构可使FPGA的峰值运算速度达到0.48GOP/s,相较ARM Mali-T628GPU平台实现23.5倍的加速比.
Convolutional neural networks(CNN)have achieved great success in various computer vision applications.The parallel architecture of convolutional neural networks were studied in this paper.Based on the parallel characteristics of network computing,a parallel CNN forward propagation architecture was proposed.The experimental results showed that under the operating frequency of 110 MHz,the architecture could make the FPGA peak operating speed of 0.48 GOP/s,compared to the ARM Mali-T628 GPU platform to achieve 23.5×speed.
作者
蒋林
王喜娟
刘镇弢
谢晓燕
衡茜
JIANG Lin;WANG Xi-juan;LIU Zhen-tao;XIE Xiao-yan;HENG Qian(College of Electronic Engineering,Xi' an University of Posts and Telecommunications,Xi' an 710121,China;College of Computer Science,Xi' an University of Posts and Telecommunications,Xi' an 710121,China)
出处
《微电子学与计算机》
CSCD
北大核心
2018年第8期132-136,共5页
Microelectronics & Computer
基金
国家自然科学基金项目(61772417
61602377
61634004
61272120)
陕西省科技统筹创新工程项目(2016KTZDGY02-04-02)
陕西省重点研发计划项目(2017GY-060)
关键词
卷积神经网络
现场可编程门阵列
阵列处理器
并行性
convolutional neural network
field-programmable gate array
Array processor
parallelism