期刊文献+

基于GPU的多层次并行QR分解算法研究 被引量:4

Research on Multi-Level Parallel Algorithm of GPU Based QR Decomposition
下载PDF
导出
摘要 QR分解作为一个基本计算模块,广泛应用在图像处理、信号处理、通信工程等众多领域。传统的并行QR分解算法只能挖掘计算过程中的数据级并行。在分析快速Givens Rotation分解特征的基础上,提出了一种多层次并行算法,能够同时挖掘计算过程中的任务级并行和数据级并行,非常适合于以图形处理器(GPU)为代表的大规模并行处理器。同时,采用GPU的并行QR分解算法可以作为基本运算模块被GPU平台上的众多应用程序直接调用。实验结果显示,与CPU平台上使用OpenMP实现的算法相比,基于GPU的多层次并行算法能够获得5倍以上的性能提升,而调用QR分解模块的奇异值分解(SVD)应用可以获得3倍以上的性能提升。 QR decomposition has been widely used as a fundamental computation module in many applications,such as image processing,signal processing and communication,and so on.Traditional parallel implementation of QR decomposition can only exploit data parallelism.Based on the inherent characteristics of Fast Givens Rotation algorithm,this paper proposed a multi-level parallel algorithm,which can exploit task parallelism and data parallelism concurrently and be suitable for massively parallel processors exemplified by Graphics Processing Units (GPU).Meanwhile,the parallel QR implementation on GPU can be reused by a variety of applications.The experimental results reveal that compared to OpenMP based implementation on CPU,this multi-level parallel algorithm implemented on GPU can improve the performance of 5X and SVD application,and invoking GPU based QR module can achieve a speedup of 3X.
出处 《计算机仿真》 CSCD 北大核心 2013年第9期234-238,共5页 Computer Simulation
基金 国家自然科学基金(61272085)
关键词 正交三角矩阵分解 图形处理器 多层次并行 快速吉文斯旋转 QR Decomposition Graphics Processing Unit (GPU) Multi-Level Parallel Fast Givens Rotation
  • 相关文献

参考文献9

  • 1王斌,王彦平,洪文.多基线SAR三维成像的QR分解算法[J].中国科学院研究生院学报,2011,28(1):80-85. 被引量:1
  • 2朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋.用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法[J].电子与信息学报,2012,34(8):1968-1973. 被引量:4
  • 3A Farina,L Timmoneri.Parallel Algorithms and Processing Architectures for Space Time Adaptive Processing[C].Proc.of CIE International Conf,1996:770-774. 被引量:1
  • 4I NDunn,G L Meyer.Parallel QR Factorization for Hybrid Message Passing/shared Memory Operation[J].Journal of the Franklin Institute,2001,338 (5):601-613. 被引量:1
  • 5V Hernandez,J E Roman,A Tomas.A Parallel Variant of the Gram Schmidt Process with Rorthogonalization[C],Proc.of Int'l Conf.on Parallel Computing:Current & Future Issues of HighEnd Computing,2005:221-228. 被引量:1
  • 6周杰,陈啸洋,赵建勋,窦勇.大矩阵QR分解的FPGA设计与实现[J].计算机工程与科学,2010,32(10):34-37. 被引量:7
  • 7B K David and W H Wen.Programming Massively Parallel Processors:A Hands-on Approach[M],San Mateo:Morgan Kaufmann,2010. 被引量:1
  • 8Kerr,D Campbell and M Richards.QR decomposition on GPUs[C].Workshop on General Purpose Processing on Graphics Processing Units,Washington,D.C.,2009,383:71-78. 被引量:1
  • 9H Ryan,M Theresa,K Jeremy and L James.The HPEC Challenge Benchmark Suite[C].In Proceedings of the Ninth Annual HighPerformance Embedded Computing Workshop,Lexington,MA,September 2005. 被引量:1

二级参考文献24

  • 1Farina A, Timmoneri L. Parallel Algorithms and Processing Architectures for Space-Time Adaptive Proeessing[C]//Proc of CIE Int'l Conf, 1996:770-774. 被引量:1
  • 2Fischer B, Modersitzki J. Fast Inversion of Matrices Arising in Image Processin[J]. Computer Science, 1999,22(1) :1-11. 被引量:1
  • 3Batchelor G H. Introduction to Fluid Dynamics[M]. 2nd ed. Cambridge University Press, 2000. 被引量:1
  • 4Ojalvo I U. Proper Use of Lanczos Vectors for Large Eigenvalue Problems[J]. Computers & Structures, 1985,20 (1-3) : 115-120. 被引量:1
  • 5Fernandez L, Garcia J M. The Performance of Fast Givens Rotation Problem Implemented with MPI Extensions in Multicomputer[C]//Proc of Int'l Conf on Applications of High- Performance Computers in Engineering, 1997 : 83-92. 被引量:1
  • 6Dunn I N, Meyer G G L. Parallel QR Factorization for Hybrid Message Passing/shared Memory Operation[J]. Journal of the Franklin Institute, 2001,338(5) : 601-613. 被引量:1
  • 7Hernandez V,Roman J E,Tomas A. A Parallel Variant of the Gram-Schmidt Process with Reorthogonalization[C]//Proc of Int'l Conf on Parallel Computing; Current & Future Issues of High-End Computing, 2005:221-228. 被引量:1
  • 8Hamill R,McCanny J V,Walke R L. Online CORDIC Algorithm and VLSI Architecture for Implementing QR-Array Processors[J]. IEEE Trans on Signal Processing, 2000, 48 (2) :592-598. 被引量:1
  • 9Sergyienko A, Maslennikov O. Implementation of Givens QR- Decomposition in FPGA[C] ff Proe of Int'l Conf on Parallel Processing and Applied Mathematics,2000: 458-465. 被引量:1
  • 10Lorenzelli F, Yao K. A Systematic Folding Design Procedure for a 1-D RLS Systolic Array[C]//Proc of IEEE VLSI Signal Processing V, 1992 : 469-482. 被引量:1

共引文献9

同被引文献43

  • 1王前,吴淑泉,李韬,冼志妙.三角矩阵求逆的ASIC实现研究[J].微电子学与计算机,2004,21(8):135-136. 被引量:4
  • 2M P Bends0e, 0 Sigmund. Topology optimization : theory, meth-ods, and applications[ M] . Berlin : Springer - Verlag, 2003. 被引量:1
  • 3T Borrvall, J Petersson. I^rge - scale topology optimization in 3Dusing parallel computing [ J ]. Computer Methods in Applied Me-chanics an(] Engineering, 2001,190:6201 -6229. 被引量:1
  • 4T Kim,J Kim, Y Kim. Parallelized structural topology optimiza-tion for eigenvalue problems [ J ]. International Journal of Solidsand Structures, 2004,41:2623 -2641. 被引量:1
  • 5K Vemaganti, Vi' I^iwrence. Parallel methods for optimality criteria—based topology optimization [ J ]. Computer Methods in AppliedMechanics and Kngineering, 2005,194:3637 -3667. 被引量:1
  • 6X Huang, M Xie. Evolutionary topology optimization of continu-um structures: methods and applioations[ M ]. New York : Wiley,2010. 被引量:1
  • 7T Fujisawa, M Inaha, G Yagawa. Parallel computing of high -speed compressible flows using a node - based finite - elementmethod[ J]. International journal for numerical methods in engi-neering, 2003,58:48I -511. 被引量:1
  • 8G Yagawa. Node — by - node parallel finite elements: a virtuallymeshless method[ J ]. International journal for numerical methodsin engineering, 2004,60:69 - 102. 被引量:1
  • 9李浪,李仁发.基于数据流异常挖掘的入侵检测系统设计[J].科学技术与工程,2008,8(13):3500-3503. 被引量:5
  • 10李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10):2585-2596. 被引量:45

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部