摘要
为了提高人工智能加速器的运算效率和功耗效率,提出了一种新的卷积神经网络(CNN)加速器结构,并实现了神经网络存算一体的方法。首先,设计出一种神经网络架构,其具有高度并行计算以及乘加器(MAC)单元高效运行的特性。其次,为了降低功耗和面积,采用了对称的静态随机存储器(SRAM)阵列和可调数据流向结构,实现多层网络在SRAM中高效计算,减少了访问外部存储器次数,降低了功耗,提高运算效率。通过中芯国际40 nm工艺,完成了系统芯片(SoC)设计、流片与测试。结果表明运算速度在500 MHz下,算力可达288 GOPS;全速运行功耗89.4 mW;面积1.514 mm^(2);算力功耗比3.22 TOPS/W;40 nm算力面积比为95.1 GOPS/mm^(2)。与已有文献的相比,算力功耗至少提升4.54%,算力面积至少提升134%,对于嵌入式场景应用较适合。
In order to improve the operation efficiency and power efficiency of artificial intelligence accelerator,proposes a new convolutional neural network(CNN)accelerator,and realizes a computing-in-memory method.Firstly,a neural network architecture is designed,which has the characteristics of highly parallel computing and efficient operation of MAC unit.Secondly,in order to reduce power consumption and die size,a symmetric SRAM array and an adjustable data flow structure are adopted to realize the efficient computation of multi-layer network in SRAM,which reduces the times of external memory access and the power consumption of SoC system.Operation efficiency is improved as well.Through the 40 nm process of SMIC,the SOC design,tape and test are completed.Results show that the computational power can reach 288 GOPS at 500 MHz,the power consumption at full speed is 89.4 MW,the area is 1.514 mm^(2),the computational power consumption ratio is 3.22 TOPS/W and the 40 nm computational power area ratio is 95.1 GOPS/mm^(2).Compared with results in other literatures,the power consumption and area of computing power increase by at least 4.54% and 134%,respectively,which is more suitable for embedded ends.
作者
易冬柏
陈恒
何乐年
Yi Dongbai;Chen Heng;He Lenian(College of Information Science&Electronic Engineering,Zhejiang University,Hangzhou 310007,China;Zhuhai Edgeless Semiconductor Co.,Ltd.,Zhuhai 519000,China)
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2021年第7期155-163,共9页
Chinese Journal of Scientific Instrument
关键词
人工智能
加速器
卷积神经网络
边缘侧
卷积神经处理器
artificial Intelligence
accelerator
convolutional neural networks
edge
convolutional neural processor