摘要
stencil循环是科学与工程计算应用中最主要的计算核心之一。循环分块技术可有效改善stencil循环的数据局部性,提高计算并行度。分块的大小选择对stencil循环的性能影响很大,传统的分块大小选择方法通常在时间开销、人工成本、分块选择精度等方面存在短板,实用性较差。文中提出了一种基于人工神经网络的分块大小选择方法,用于预测三维Jacobi型stencil循环程序的最优分块。对来源于实际数值模拟软件中的11个stencil循环进行最优分块预测,实验结果显示,在单核串行和多核并行两种场景下,程序使用模型预测分块相比不分块的性能提升分别为2%和35%,与网格搜索方法的分块性能相当,但在线预测时间开销仅约为后者的1/30 000。此外,相比基于静态分析的Turbo-tiling方法,预测最优分块的实测性能平均提升了约9%。
Stencil computation is one kind of the most important loop kernels in scientific and engineering computing applications.Loop tiling can effectively improve the data locality of stencil computation and the degree of computational parallelism, but the best tile size is hard to choose.Traditional tile size selection methods usually have shortcomings in some ways of time overhead, labor cost and model accuracy.In this paper, a tile size selection method based on artificial neural network is proposed to predict the optimal tile size of three-dimensional Jacobi stencil loop programs.Experimental results show that, for 11 real stencil programs, the performance improvement of the programs using the model prediction tile size compared with the non tiling is 2% and 35% in serial and parallel tests respectively.Compared with the well-known grid search method, our method has a similar prediction accuracy, but only takes one 30 thousandth of the online time cost.In addition, compared with the Turbo-tiling method, our method improves the performance of tiled codes nearly 9% in average.
作者
包怡坤
张鹏
徐小文
莫则尧
BAO Yi-kun;ZHANG Peng;XU Xiao-wen;MO Ze-yao(Graduate School of China Academy of Engineering Physics,Beijing 100094,China;Institute of Applied Physics and Computational Mathematics,Beijing 100088,China;CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China;China Academy of Engineering Physics,Mianyang,Sichuan 621900,China)
出处
《计算机科学》
CSCD
北大核心
2022年第10期18-26,共9页
Computer Science
基金
国家自然科学基金(62032023)。