摘要
快速傅里叶变换(fast Fourier transform,FFT)在数字信号处理中占据核心地位.随着高性能超长点数FFT需求的增长,数字信号处理器(digital signal processor,DSP)的计算能力越来越难以满足需求,集成FFT加速器成为重要的发展趋势.为了支持超长点数FFT,将2维分解算法推广到多维,提出一种可集成于DSP的高性能超长点数FFT加速器结构.该结构通过基于素数个存储体的无冲突体编址方法实现了3维转置运算;通过递推算法实现了高效铰链因子生成;使用单精度浮点二项融合点积运算和融合加-减运算,对FFT运算电路进行了精细化设计.实现了对4G点数单精度浮点FFT计算的支持.综合结果表明:FFT加速器运行频率能够达到1GHz以上,性能达到640Gflop/s.在支持的点数和性能方面都较已有研究成果取得大幅提升.
Fast Fourier transform(FFT)plays a key role in digital signal processing.With the increasing demand of high performance ultra-long point FFT,digital signal processor(DSP)is becoming more and more difficult to meet the demand,so integrated FFT accelerators have become an important development trend.In order to support ultra-long point FFT,this paper extends the two-dimensional decomposition algorithm of FFT to multi-dimensional,and we propose a high performance ultra-long point FFT accelerator architecture which can be integrated into DSP.In this architecture,three-dimensional transposition operation is realized by using collision-free addressing method with prime number memory banks;efficient twiddle factor generation is realized by recursive algorithm;FFT operation circuit is refined by using single precision floating-point fused dot product and fused add-subtract operation.Finally,this paper realizes the single precision floating-point FFT calculation within 4G points.The synthesis result shows that the proposed FFT accelerator can run at a frequency of more than 1GHz and its performance can reach 640Gflop/s,which has been greatly improved in terms of points and performance compared with the existing research.
作者
王谛
石嵩
吴铁彬
刘亮
谭弘兵
郝子宇
过锋
李宏亮
Wang Di;Shi Song;Wu Tiebin;Liu Liang;Tan Hongbing;Hao Ziyu;Guo Feng;Li Hongliang(Jiangnan Institute of Computing Technology,Wuxi,Jiangsu 214083)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2021年第6期1192-1203,共12页
Journal of Computer Research and Development
基金
“核高基”国家科技重大专项基金项目(2018ZX01028-102)。
关键词
快速傅里叶变换
多维分解算法
3维转置运算
铰链因子生成
加速器
fast Fourier transform(FFT)
multi-dimensional decomposition algorithm
three-dimensional transposition operation
twiddle factor generation
accelerator