摘要
格点量子色动力学(格点QCD)是高能物理领域中需要大规模并行计算的最主要应用之一,相关研究通常需要消耗大量计算资源,核心是求解大规模稀疏线性方程组。文中基于国产鲲鹏920 ARM处理器,研究了格点QCD的计算热点Dslash,并将其扩展到64个节点(6 144核),展示了格点QCD计算的线性扩展性。基于roofline性能分析模型,发现格点QCD是典型的内存限制应用,并通过将Dslash中的3×3复幺正矩阵根据对称性压缩,将其性能提升约22%。对于大规模稀疏线性方程的求解,在ARM处理器上探索了常用的Krylov子空间迭代算法BiCGStab,以及近年来发展起来的前沿的multigrid算法,发现即使考虑预处理时间,在实际物理计算中使用multigrid算法相比BiCGStab依然有几倍至一个数量级的加速。此外,还考虑了鲲鹏920处理器上的NEON向量化指令,发现将其用于multigrid计算时可以带来约20%的加速。因此,在ARM处理器上使用multigrid算法能极大地加速实际的物理研究。
Lattice quantum chromodynamics(lattice QCD)is one of the most important applications of large-scale parallel computing in high energy physics,researches in this field usually consume a large amount of computing resources,and its core is to solve the large scale sparse linear equations.Based on the domestic Kunpeng 920 ARM processor,this paper studies the hot spot of lattice QCD calculation,the Dslash,which is applied on up to 64 nodes(6144 cores)and show the linear scalability.Based on the roofline performance analysis model,we find that lattice QCD is a typical memory bound application,and by using the compression of 3×3 complex unitary matrices in Dslash based on symmetry,we can improve the performance of Dslash by 22%.For the solving of large scale sparse linear equations,we also explore the usual Krylov subspace iterative algorithm such as BiCGStab and the newly developed state-of-art multigrid algorithm on the same ARM processor,and find that in the practical physics calculation the multigrid algorithm is several times to a magnitude faster than BiCGStab,even including the multigrid setup time.Moreover,we consider the NEON vectorization instructions on Kunpeng 920,and there is up to 20%improvement for multigrid algorithm.Therefore,the use of multigrid algorithm on ARM processors can speed up the physics research tremendously.
作者
孙玮
毕玉江
程耀东
SUN Wei;BI Yujiang;CHENG Yaodong(Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;Tianfu Cosmic Ray Research Center,Chengdu 610213,China;School of Nuclear Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《计算机科学》
CSCD
北大核心
2023年第6期52-57,共6页
Computer Science
基金
国家自然科学基金(12205311,11935017,12075253,12175063)
中国博士后科学基金面上资助(2022M713154)
高能物理研究所科技创新项目(E15451U2)。