基于异构众核处理器的超级计算机已经成为TOP500高性能计算机的主流,BSP、LogP、PRAM等已有并行计算模型均针对基于多核处理器的超级计算机设计,不能满足日益迫切的基于众核架构的超级计算机和应用发展需求.本文面向“神威·太湖之...基于异构众核处理器的超级计算机已经成为TOP500高性能计算机的主流,BSP、LogP、PRAM等已有并行计算模型均针对基于多核处理器的超级计算机设计,不能满足日益迫切的基于众核架构的超级计算机和应用发展需求.本文面向“神威·太湖之光”和神威E级原型系统的众核体系结构特点,提出P-PALN(Parallel-Parallel Access via LDM&NOC)并行计算模型,对于计算节点间的并行,该模型沿用BSP/LogP模型描述;对于计算节点内的众核并行,该模型提供私有存储访问和片上阵列通信的众核并行架构的有效描述PALN,能够协助用户进行众核并行算法设计,并在申威众核处理器硬件设计中指导参数的优化.实验结果表明,该模型可有效指导硬件设计和用户众核编程,从而提高系统和应用的性能.展开更多
High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation ...High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation of discontinuous Galerkin density functional theory(DGDFT)method on the Sunway Taihu Light supercomputer.The DGDFT method uses the adaptive local basis(ALB)functions generated on-the-fly during the self-consistent field(SCF)iteration to solve the KS equations with high precision comparable to plane-wave basis set.In particular,the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution,task scheduling,and data communication schemes,and combines with the master–slave multi-thread heterogeneous parallelism of SW26010 processor,resulting in large-scale HPC KS-DFT calculations on the Sunway Taihu Light supercomputer.We show that the DGDFT method can scale up to 8,519,680 processing cores(131,072 core groups)on the Sunway Taihu Light supercomputer for studying the electronic structures of twodimensional(2 D)metallic graphene systems that contain tens of thousands of carbon atoms.展开更多
文摘基于异构众核处理器的超级计算机已经成为TOP500高性能计算机的主流,BSP、LogP、PRAM等已有并行计算模型均针对基于多核处理器的超级计算机设计,不能满足日益迫切的基于众核架构的超级计算机和应用发展需求.本文面向“神威·太湖之光”和神威E级原型系统的众核体系结构特点,提出P-PALN(Parallel-Parallel Access via LDM&NOC)并行计算模型,对于计算节点间的并行,该模型沿用BSP/LogP模型描述;对于计算节点内的众核并行,该模型提供私有存储访问和片上阵列通信的众核并行架构的有效描述PALN,能够协助用户进行众核并行算法设计,并在申威众核处理器硬件设计中指导参数的优化.实验结果表明,该模型可有效指导硬件设计和用户众核编程,从而提高系统和应用的性能.
基金partly supported by the Supercomputer Application Project Trail Funding from Wuxi Jiangnan Institute of Computing Technology(BB2340000016)the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)+6 种基金the National Natural Science Foundation of China(21688102,21803066)the Anhui Initiative in Quantum Information Technologies(AHY090400)the National Key Research and Development Program of China(2016YFA0200604)the Fundamental Research Funds for Central Universities(WK2340000091)the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)the Research Start-Up Grants(KY2340000094)the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China。
文摘High performance computing(HPC)is a powerful tool to accelerate the Kohn–Sham density functional theory(KS-DFT)calculations on modern heterogeneous supercomputers.Here,we describe a massively parallel implementation of discontinuous Galerkin density functional theory(DGDFT)method on the Sunway Taihu Light supercomputer.The DGDFT method uses the adaptive local basis(ALB)functions generated on-the-fly during the self-consistent field(SCF)iteration to solve the KS equations with high precision comparable to plane-wave basis set.In particular,the DGDFT method adopts a two-level parallelization strategy that deals with various types of data distribution,task scheduling,and data communication schemes,and combines with the master–slave multi-thread heterogeneous parallelism of SW26010 processor,resulting in large-scale HPC KS-DFT calculations on the Sunway Taihu Light supercomputer.We show that the DGDFT method can scale up to 8,519,680 processing cores(131,072 core groups)on the Sunway Taihu Light supercomputer for studying the electronic structures of twodimensional(2 D)metallic graphene systems that contain tens of thousands of carbon atoms.