期刊文献+

CPU/GPU集群上求解偏微分方程的可扩展混合算法 被引量:2

A Scalable Hybrid Algorithm for Solving Partial Differential Equations on a Cluster of CPU/GPU
下载PDF
导出
摘要 当前世界上排前几位的超级计算机都基于大量CPU和GPU组合的混合架构,它们对某些特殊问题,譬如基于FFT的图像处理或N体颗粒计算等领域可获得很高的性能。但是对由有限差分(或基于网格的有限元)离散的偏微分方程问题,于CPU/GPU集群上获得较好的性能仍然是一种挑战。本文提出并测试一种基于这类集群架构的混合算法。算法的可扩展性通过区域分解算法实现,而GPU的性能由基于光滑聚集的代数多重网格法获得,避免了在GPU上表现不理想的不完全分解算法。本文的数值实验采用32CPU/GPU求解用差分离散后达三千万未知数的偏微分方程。 Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs.High performance has been obtained for problems with special structures,such as FFT-based imaging processing or N-body based particle calculations.However,for the class of problems described by partial differential equations(PDEs) discretized by finite difference(or other mesh based methods such as finite element) methods,obtaining even reasonably good performance on a CPU/GPU cluster is still a challenge.In this paper,we propose and test an hybrid algorithm which matches the architecture of the cluster.The scalability of the approach is implemented by a domain decomposition method,and the GPU performance is realized by using a smoothed aggregation based algebraic multigrid method.Incomplete factorization,which performs beautifully on CPU but poorly on GPU,is completely avoided in the approach.Numerical experiments are carried out by using up to 32 CPU/GPUs for solving PDE problems discretized by FDM with up to 32 millions unknowns.
出处 《集成技术》 2012年第1期84-88,共5页 Journal of Integration Technology
关键词 PDES CPU/GPU集群 区域分解 代数多重网格 可扩展算法 PDEs CPU/GPU cluster domain decomposition algebraic multigrid scalable algorithm
  • 相关文献

参考文献12

  • 1T.Hamada,T.Narumi,R.Yokota,K.Yasuoka,K.Nitadori,M.Taiji."42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence,". SC ’’’’’’’’09 . 2009 被引量:1
  • 2Tomov S,Dongarra J,Baboulin M.Towards dense linearalgebra for hybrid GPU accelerated manycore systems. Parallel Computing . 2010 被引量:1
  • 3Saad Y.Iterative Methods for Sparse Linear Systems. . 1996 被引量:1
  • 4B. F. Smith,P. E. Bj(?)rstad,W. D. Gropp.Domain decomposition:paral-lel multilevel methods for elliptic partial differential equations. . 1996 被引量:1
  • 5Balay S,Buschelman K,Eijkhout V,et al.PETSc User’’s Manual. Tech.Rep.ANL-95/11-Revision3.1.Argonne National Laboratory . 2010 被引量:1
  • 6Chen Y,Cui X,Mei H.Large-scale FFT on GPU clusters. ICS’’10:Proceedings of the24th ACM International Conference on Supercomputing . 2010 被引量:1
  • 7Minden V,Smith B,Knepley M G.Preliminary implementation of PETSc using GPUs. Proceedings of the2010International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering . 2010 被引量:1
  • 8Rocha B M,Campos F O,Amorim R M,et al.Accelerating cardiac excitation spread simulations using graphics processing units. Concurr.Comput.Pract.Exper . 2011 被引量:1
  • 9Douglas C C,Lee H,Haase G,et al.Parallel algebraic multigrid method with GP-GPU hardware acceleration. Journal of Computational and Applied Mathematics . 2011 被引量:1
  • 10Cai X-C,Sarkis M.A restricted additive schwarz preconditioner for general sparse linear systems. Tech.Report CUCS-843-97,Department of Computer Science,University of Colorado at Boulder . 1997 被引量:1

同被引文献16

引证文献2

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部