摘要
【应用背景】ARM众核架构处理器以其高性能、高并行性及低功耗的特点,在分子动力学、流体及天气模拟等领域扮演着越来越重要的作用。【局限】然而,分子动力学模拟软件运行时不同维度的任务分解策略(如粒子作用、时空域分解等)、多样化的并行策略导致负载特征多样而与众核处理器基于高度并行的计算资源所产生的算力这一特点难以很好地匹配,进而导致各计算单元运行时效率低的问题已成为了限制运行时性能提升的重要瓶颈之一。【方法】针对这一问题,以华为技术有限公司自主研发ARM架构鲲鹏920处理器和GROMACS软件为研究对象,通过对鲲鹏920处理器结构特点和算力特征、GROMACS软件任务分解、并行执行过程进行深入分析,提出运行时并行参数优化策略,以更好地适配软件的算力需求和硬件的算力特点,提升了软件计算性能。【结果】通过系统分析性能瓶颈并实践优化策略,相比优化前取得了16.9%的加速效果。【结论】此研究成果可为分子动力学模拟在众核计算环境下的性能优化、国产高性能计算系统及分子动力学模拟专用机等的研发提供一定的参考依据。
[Background]ARM multicore architecture processors play an increasingly important role in domains such as molecular dynamics,fluid dynamics,and weather simulations due to their high performance,parallelism,and low power consumption.[Limitation]However,the diverse workload characteristics and various parallelization strategies employed in molecular dynamics simulation software,such as particle interaction and spatiotemporal domain decomposition,pose challenges to efficient utilization of the highly parallel computational resources of multicore processors,leading to low execution efficiency of individual compute units.This has become one of the significant bottlenecks limiting performance improvement.[Method]This paper focuses on Huawei Technologies’self-developed ARM-based Kunpeng-920 processor and the GROMACS software as the research subjects.It conducts a detailed analysis of the Kunpeng-920 processor’s architecture and computational capabilities,as well as the task decomposition and parallel execution characteristics of the GROMACS software.Based on this analysis,it proposes a runtime parallel parameter optimization strategy to better match the software’s computational requirements with the hardware’s computational capabilities,thereby improving the software’s computational performance.[Result]By systematically identifying performance bottlenecks and implementing optimization strategies,our scheme achieves a 16.9%acceleration compared to the pre-optimized state.[Conclusion]This research outcome can serve as a reference for performance optimization of molecular dynamics simulations in multicore computing environments,for the development of domestically produced high-performance computing systems,and for dedicated machines for molecular dynamics simulations.
作者
原惠峰
陆腾
朱延超
晏臣
马英晋
刘倩
金钟
YUAN Huifeng;LU Teng;ZHU Yanchao;YAN Chen;MAYingjin;LIU Qian;JIN Zhong(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083,China;University of Chinese Academy of Sciences,Beijing 100190,China;HPC Lab.,Huawei Technologies Co.,Ltd.,Hangzhou,Zhejiang 310053,China)
基金
国家重点研发计划“多物理复杂体系科学计算应用平台”(2020YFB0204802)
国家自然科学基金“针对密度矩阵重正化群及其衍生方法的高性能计算程序开发研究”(No.22173114)。