摘要
从单机性能优化角度对一个高阶精度结构网格CFD并行程序进行了优化。通过识别关键变量并对其进行常量参数化优化,使编译器能够实现更高级别的针对性优化;根据程序数据结构特点及访问模式,设计了分级数据缓存技术,使程序主要计算代码能够以更优的方式访问主要数据结构,提高了访存空间局部性;进行了各种循环变换,以优化访存性能。在国家超算长沙中心"Tianhe-1A"并行机上的测试结果表明,相对于采用Intel编译器最高优化级别的版本,其对100万网格点二维翼型算例,串行程序性能提高约22.2%~28.9%;对1.12亿网格点三角翼算例,并行程序性能提高约13.9%~20.2%。
This paper optimized the performance of a high order structure grid based parallel CFD (Computational Fluid Dynamics) application from a view of uniprocessor optimization. Performance critical variables were identified and trans- formed into constant parameters to enable compiler to apply specific high level optimizations. Multi-level data buffering was applied for the application's main data structures based on their structure and access characteristics, enabling the main computation codes to access these data more efficiently. Some loop transformations were applied tO optimize the application's memory access performance. Performance evaluation was carried out on "Tianhe-lA" parallel computer in- stalled at national super computer center in Changsha. Compared to the original code compiled by Intel compiler with the highest optimization level,the optimized code improves the serial performance for about 22. 2%-28. 9%for an 100 million grid points 2D aerofoil test case, and improves the parallel performance for about 13.9%-20. 2% for an 112 million grid points delta aerofoil test case.
出处
《计算机科学》
CSCD
北大核心
2013年第3期116-120,共5页
Computer Science
基金
国家重点基础研究发展计划(973)课题(G2009CB723803)
国家自然科学基金项目(11272352
61103014
60603055)资助
关键词
CFD并行计算
单机性能优化
关键变量参数化
分级数据缓存
Parallel CFD, Uniprocessor performance tuning, Key variable parameterization, Multi-level data buffering