摘要
QR分解作为一个基本计算模块,广泛应用在图像处理、信号处理、通信工程等众多领域。传统的并行QR分解算法只能挖掘计算过程中的数据级并行。在分析快速Givens Rotation分解特征的基础上,提出了一种多层次并行算法,能够同时挖掘计算过程中的任务级并行和数据级并行,非常适合于以图形处理器(GPU)为代表的大规模并行处理器。同时,采用GPU的并行QR分解算法可以作为基本运算模块被GPU平台上的众多应用程序直接调用。实验结果显示,与CPU平台上使用OpenMP实现的算法相比,基于GPU的多层次并行算法能够获得5倍以上的性能提升,而调用QR分解模块的奇异值分解(SVD)应用可以获得3倍以上的性能提升。
QR decomposition has been widely used as a fundamental computation module in many applications,such as image processing,signal processing and communication,and so on.Traditional parallel implementation of QR decomposition can only exploit data parallelism.Based on the inherent characteristics of Fast Givens Rotation algorithm,this paper proposed a multi-level parallel algorithm,which can exploit task parallelism and data parallelism concurrently and be suitable for massively parallel processors exemplified by Graphics Processing Units (GPU).Meanwhile,the parallel QR implementation on GPU can be reused by a variety of applications.The experimental results reveal that compared to OpenMP based implementation on CPU,this multi-level parallel algorithm implemented on GPU can improve the performance of 5X and SVD application,and invoking GPU based QR module can achieve a speedup of 3X.
出处
《计算机仿真》
CSCD
北大核心
2013年第9期234-238,共5页
Computer Simulation
基金
国家自然科学基金(61272085)