期刊文献+

龙芯3A处理器上FFT的高效实现 被引量:6

Efficient Implementation of FFT on Loongson 3A CPU
下载PDF
导出
摘要 FFT(Fast Fourier transform,快速傅立叶变换)是工程应用中的一个基本算法,优化其性能对于推广龙芯系列处理器的应用具有重要意义.本文充分挖掘龙芯3A处理器的硬件特性,对运算量和调整位序的过程作了优化并使用128位访存来减少访存指令的比例,从而实现了高效的FFT算法.实验结果表明,在825M龙芯3A处理器上经过优化后的一维FFT的速度是FF-TW库的2.5倍左右,而二维FFT的速度则是FFTW的3倍左右. To promote the application of Loongson processers,it is of enormous significance to optimize the performance of FFT(Fast Fourier transform),which is a basic tool in many engineering fields.In this paper,the hardware characteristics of loongson 3A processer are fully exploited based on some programming techniques,such as improving the computation and the bit reverse process and utilizing the Loongson 3A′s 128 bit memory access instructions to reduce the ratio of the memory instructions,and finally efficient FFT algorithms are implemented.The experiments show that the proposed 1d-fft and 2d-ff algorithms on 825MHz loongson 3A processor are about 2.5 and 3 times as fast as FFTW respectively.
出处 《小型微型计算机系统》 CSCD 北大核心 2012年第3期594-597,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60833004)资助 国家"八六三"高技术研究发展计划项目(2008AA010902)资助
关键词 龙芯3A FFT 性能优化 KD-60 loongson 3A FFT performance optimization KD-60
  • 相关文献

参考文献4

二级参考文献31

  • 1谢应科,付博.数据全并行FFT处理器的设计[J].计算机研究与发展,2004,41(6):1022-1029. 被引量:7
  • 2马余泰.FFT处理器无冲突地址生成方法[J].计算机学报,1995,18(11):875-880. 被引量:10
  • 3Graventreter G, Meltem R. Realizing common communication patterns in partitioned optical passive star networks[J]. IEEE Transaction on Computer, 1998,47(9). 被引量:1
  • 4Meltem R, Graventreter G, Chiarulli D. The communication ability of partitioned optical passive star networks[A]. Li. K, Pan. Y, and Zheng. S, Parallel Computing Use Optical Interconnections [C]. Kluwer Academic Publishers, 1998 , 77-98. 被引量:1
  • 5Graventreter G, Meltem R, Chiarulli D. The partitioned optical passive stars (POPS) topology[C]. Proc. of the Ninth International Parallel Processing Symp., Santa Barbara, 1995,4-10. 被引量:1
  • 6Sahni S. The partitioned optical passive star network: simulations and fundamental operations[J]. IEEE Transaction on Parallel and Distributed System,2000,11(7). 被引量:1
  • 7Berthome P, Cohen J, Ferreira A. Embedding tori in partioned optical passive star networks [C]. Proceeding. the Fourth Int'l Colloquium Structure Imfformation and Comm. Complexity-Sirocco'97, 1997,40-52. 被引量:1
  • 8Sahni S. Matrix multiplication and data routing using a partitioned optical passive star network[J]. IEEE Transaction on Parallel and Distributed System, 2000, 11(7). 被引量:1
  • 9Seguel J, Bollman D, J Feo J. A framework for the design and implementation of FFT permution algorithms[J]. IEEE Transaction on Parallel and Distributed System, 2000, 11 (7). 被引量:1
  • 10Castleman K R.Digital image processing[M].USA:Prentice Hall,1995:141-169. 被引量:1

共引文献13

同被引文献20

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部