摘要
High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation of discontinuous Galerkin density functional theory(DGDFT)method on the Sunway Taihu Light supercomputer.The DGDFT method uses the adaptive local basis(ALB)functions generated on-the-fly during the self-consistent field(SCF)iteration to solve the KS equations with high precision comparable to plane-wave basis set.In particular,the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution,task scheduling,and data communication schemes,and combines with the master–slave multi-thread heterogeneous parallelism of SW26010 processor,resulting in large-scale HPC KS-DFT calculations on the Sunway Taihu Light supercomputer.We show that the DGDFT method can scale up to 8,519,680 processing cores(131,072 core groups)on the Sunway Taihu Light supercomputer for studying the electronic structures of twodimensional(2 D)metallic graphene systems that contain tens of thousands of carbon atoms.
高性能计算(HPC)是在现代异构超级计算机上加速Kohn-Sham密度泛函理论(KS-DFT)计算的有力手段.本文描述了DGDFT方法在神威太湖之光超级计算机上大规模并行计算的代码实现和优化.DGDFT方法是利用自洽场(SCF)迭代过程中动态生成的自适应局域基函数(ALB)来求解KS方程,具有可媲美平波基组的高精度计算结果.特别地,DGDFT方法采用了两级并行化策略,用于处理并行计算中各种类型的数据分布、任务调度和数据通信方案等;同时结合了SW26010处理器的主从核多线程异构并行,在神威太湖之光上实现了超大规模高性能KS-DFT计算模拟.计算结果表明,DGDFT方法已经在神威太湖之光超级计算机上并行扩展到了8519680个计算处理核(131072个核组),可用于研究含有数万碳原子的二维金属石墨烯体系的电子结构性质.
作者
Wei Hu
Xinming Qin
Qingcai Jiang
Junshi Chen
Hong An
Weile Jia
Fang Li
Xin Liu
Dexun Chen
Fangfang Liu
Yuwen Zhao
Jinlong Yang
胡伟;秦新明;姜庆彩;陈俊仕;安虹;贾伟乐;李芳;刘鑫;陈德训;刘芳芳;赵玉文;杨金龙(Hefei National Laboratory for Physical Sciences at the Microscale,Department of Chemical Physics,and Synergetic Innovation Center of Quantum Information and Quantum Physics,University of Science and Technology of China,Hefei 230026,China;School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China;Department of Mathematics,University of California,Berkeley,CA 94720,USA;National Supercomputing Center,Wuxi 214072,China;Institute of Software Chinese Academy of Sciences,Beijing 100190,China)
基金
partly supported by the Supercomputer Application Project Trail Funding from Wuxi Jiangnan Institute of Computing Technology(BB2340000016)
the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)
the National Natural Science Foundation of China(21688102,21803066)
the Anhui Initiative in Quantum Information Technologies(AHY090400)
the National Key Research and Development Program of China(2016YFA0200604)
the Fundamental Research Funds for Central Universities(WK2340000091)
the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)
the Research Start-Up Grants(KY2340000094)
the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China。