摘要
排序连续干扰消除(Ordered successive interference cancellation,OSIC)是多输入多输出(Multiple input multiple output, MIMO)系统中一种常用的信号检测算法,但该算法的吞吐量、时延等指标受制于信道矩阵逆运算。因此,计算复杂度低且能高速实现矩阵求逆分解预处理是算法硬件实现的关键。本文采用对信道矩阵进行排序正交三角(Orthogonal triangle, QR)分解的矩阵预处理硬件加速方案,其中排序过程引入对复值1范数的快速估计方法消除复数模计算,QR分解过程利用深度流水化坐标旋转数字计算(Coordinate rotation digital computer, CORDIC)迭代方法消除Givens旋转过程中的元素矢量化,计算置零旋转角度,实现了面向QR分解的可复用Givens旋转结构的流水线电路结构设计,使矩阵分解过程中无需乘法器。仿真结果表明,本文所提OSIC改进算法误比特率性能与基于信噪比的OSIC检测算法性能基本一致,所提的基于CORDIC迭代的Givens旋转结构能够高度分时复用,显著提升系统并行度并极大减少资源占用,系统设计时钟最高能达到250 MHz,矩阵分解吞吐量能达到1.88 M Matrices/s,能够满足4天线及以上MIMO系统接收端吞吐量和时延需求。
Ordered successive interference cancellation(OSIC)is a commonly utilized signal detection algorithm in multiple input multiple output(MIMO)systems.However,the algorithm’s performance in terms of throughput and latency is constrained by the computational complexity of the channel matrix inverse operation.Therefore,matrix inverse decomposition pre-processing with low computational complexity and high speed is the key to hardware implementation of the algorithm.In this paper,we adopt a hardware-accelerated matrix pre-processing scheme for sorted orthogonal triangle(QR)decomposition of the channel matrix,in which the sorting process introducing a fast estimation method for complex-valued 1-norm to eliminate complex modulus computation.The QR decomposition process uses a deeply pipelined coordinate rotation digital computer(CORDIC)iterative method to eliminate the element vectorization and nulling rotation angle computation in the Givens rotation process,thus a pipeline circuit structure with a reusable Givens rotation structure for QR decomposition is designed,obviating the necessity for multipliers in the matrix decomposition process.Simulation results demonstrate that the OSIC enhancement algorithm proposed achieves the bit error rate(BER)performance comparable to that of the signal-to-noise ratio-based OSIC detection algorithm.The CORDIC iterative Givens rotation structure proposed in this paper can achieve highly time-sharing multiplex.It significantly improves the system parallelism and reduces the resource consumption,and the system design clock attains up to 250 MHz,and the matrix decomposition throughput reaches 1.88 M Matrices/s,meeting the processing throughput and latency requirements of 4 or more antennas MIMO systems at the receiver.
作者
王海麟
冯献礼
辜方林
高明柯
赵海涛
WANG Hailin;FENG Xianli;GU Fanglin;GAO Mingke;ZHAO Haitao(College of Electronic Science and Technology,National University of Defense Technology,Changsha 410073,China;The 32nd Research Institute,China Electronic Technology Group Corporation,Shanghai 201808,China)
出处
《数据采集与处理》
CSCD
北大核心
2024年第6期1420-1431,共12页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61931020)。
关键词
多输入多输出信号检测
排序连续干扰消除
排序QR分解
Givens旋转
现场可编程门阵列
multiple input multiple output(MIMO)signal detection
ordered successive interference cancellation(OSIC)
sorted QR decomposition
Givens rotation
field programmable gate array(FPGA)