摘要
当前世界上排前几位的超级计算机都基于大量CPU和GPU组合的混合架构,它们对某些特殊问题,譬如基于FFT的图像处理或N体颗粒计算等领域可获得很高的性能。但是对由有限差分(或基于网格的有限元)离散的偏微分方程问题,于CPU/GPU集群上获得较好的性能仍然是一种挑战。本文提出并测试一种基于这类集群架构的混合算法。算法的可扩展性通过区域分解算法实现,而GPU的性能由基于光滑聚集的代数多重网格法获得,避免了在GPU上表现不理想的不完全分解算法。本文的数值实验采用32CPU/GPU求解用差分离散后达三千万未知数的偏微分方程。
Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs.High performance has been obtained for problems with special structures,such as FFT-based imaging processing or N-body based particle calculations.However,for the class of problems described by partial differential equations(PDEs) discretized by finite difference(or other mesh based methods such as finite element) methods,obtaining even reasonably good performance on a CPU/GPU cluster is still a challenge.In this paper,we propose and test an hybrid algorithm which matches the architecture of the cluster.The scalability of the approach is implemented by a domain decomposition method,and the GPU performance is realized by using a smoothed aggregation based algebraic multigrid method.Incomplete factorization,which performs beautifully on CPU but poorly on GPU,is completely avoided in the approach.Numerical experiments are carried out by using up to 32 CPU/GPUs for solving PDE problems discretized by FDM with up to 32 millions unknowns.
出处
《集成技术》
2012年第1期84-88,共5页
Journal of Integration Technology