摘要
基于国产的FT-M7002平台高性能DSP,针对不同类型的点积算法进行了优化实现,完善了该处理器平台数学库的技术链,充分发挥了FT-M7002内核体系结构优势,对点积算法实现了SIMD向量并行化、DMA双通道传输和SVR传输等优化。该研究充分挖掘了程序的向量并行性,有效地提升了数据传输的速度,提高了程序性能。实验结果表明,输入不同规模大小的数组,不同类型的点积算法在FT-M7002平台上优化后和优化前的平均性能比为12.4166~45.2338。相较于TI官网的dsplib库中不同类型的点积函数在TMS320C6678处理器上运行的性能,FT-M7002平台优化后的性能与TI平台的平均性能比为1.3716~4.5196。实验结果表明了该DSP平台相对于TI主流平台的计算性能优势。
On the high-performance DSP of domestic FT-M7002 platform,different types of dot product algorithms are optimized and implemented.The technical chain of the mathematical library of the processor platform is consummated.Taking full advantage of FT-M7002 kernel architecture,SIMD vector parallelization,DMA dual channel transmission,SVR transmission and other optimization methods for dot product algorithm are realized.The research fully excavates the vector parallelism of the program,effectively improving the speed of data transmission and improving the performance of the program.The experimental results show that the average performance ratio of different types of dot product algorithms after and before optimization on FT platform is 12.4166~45.2338.Compared with the performance of different types of dot product functions in dsplib library on TI official website on TMS320C6678 processor,the average performance ratio between FT platform and TI platform is 1.3716~4.5196.The research results show that the DSP platform has obvious computational performance advantages over TI mainstream platform.
作者
郭盼盼
陈梦雪
梁祖达
马晓畅
许邦建
GUO Pan-pan;CHEN Meng-xue;LIANG Zu-da;MA Xiao-chang;XU Bang-jian(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450066;National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450001;School of Electrical and Information Engineering,Hunan University,Changsha 410082;School of Information Science and Engineering,Hunan University,Changsha 410082,China)
出处
《计算机工程与科学》
CSCD
北大核心
2022年第11期1909-1917,共9页
Computer Engineering & Science
关键词
FT-M7002
DSP
点积算法
向量
DMA双通道传输
SVR传输
FT-M7002
digital signal processor(DSP)
dot product algorithm
vector
DMA dual channel transmission
SVR transmission