期刊文献+

基于神经网络模型的stencil循环最优分块大小预测

Prediction of Optimal Loop Tiling Size for stencil Computation Based on Neural Network Model
下载PDF
导出
摘要 stencil循环是科学与工程计算应用中最主要的计算核心之一。循环分块技术可有效改善stencil循环的数据局部性,提高计算并行度。分块的大小选择对stencil循环的性能影响很大,传统的分块大小选择方法通常在时间开销、人工成本、分块选择精度等方面存在短板,实用性较差。文中提出了一种基于人工神经网络的分块大小选择方法,用于预测三维Jacobi型stencil循环程序的最优分块。对来源于实际数值模拟软件中的11个stencil循环进行最优分块预测,实验结果显示,在单核串行和多核并行两种场景下,程序使用模型预测分块相比不分块的性能提升分别为2%和35%,与网格搜索方法的分块性能相当,但在线预测时间开销仅约为后者的1/30 000。此外,相比基于静态分析的Turbo-tiling方法,预测最优分块的实测性能平均提升了约9%。 Stencil computation is one kind of the most important loop kernels in scientific and engineering computing applications.Loop tiling can effectively improve the data locality of stencil computation and the degree of computational parallelism, but the best tile size is hard to choose.Traditional tile size selection methods usually have shortcomings in some ways of time overhead, labor cost and model accuracy.In this paper, a tile size selection method based on artificial neural network is proposed to predict the optimal tile size of three-dimensional Jacobi stencil loop programs.Experimental results show that, for 11 real stencil programs, the performance improvement of the programs using the model prediction tile size compared with the non tiling is 2% and 35% in serial and parallel tests respectively.Compared with the well-known grid search method, our method has a similar prediction accuracy, but only takes one 30 thousandth of the online time cost.In addition, compared with the Turbo-tiling method, our method improves the performance of tiled codes nearly 9% in average.
作者 包怡坤 张鹏 徐小文 莫则尧 BAO Yi-kun;ZHANG Peng;XU Xiao-wen;MO Ze-yao(Graduate School of China Academy of Engineering Physics,Beijing 100094,China;Institute of Applied Physics and Computational Mathematics,Beijing 100088,China;CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China;China Academy of Engineering Physics,Mianyang,Sichuan 621900,China)
出处 《计算机科学》 CSCD 北大核心 2022年第10期18-26,共9页 Computer Science
基金 国家自然科学基金(62032023)。
关键词 stencil计算 循环分块技术 机器学习 人工神经网络 Stencil computation Loop tiling technology Machine learning Artificial neural network
  • 相关文献

参考文献3

二级参考文献89

  • 1Owens J D, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware [J]. Computer Graphics Forum, 2007, 26(1) : 80-113. 被引量:1
  • 2Grosser T, Cohen A, Kelly P, et al. Split tiling for GPUs: Automatic parallelization using trapezoidal tiles [C]//Proc of the 6th Workshop on General Purpose Processor Using Graphics Processing Units. New York: ACM, 2013: 24-31. 被引量:1
  • 3Kaspersky K. Code Optimization: Effective Memory Usage [M]. New Delhi, India: BPB Publications, 2004. 被引量:1
  • 4Baghdadi R, Cohen A, Verdoolaege S, et al. Improved loop tiling based on the removal of spurious false dependences [J]. ACM Trans on Architecture and Code Optimization(TACO) Special Issue on High-Performance Embedded Architectures and Compilers, 2013, 9(4): 1-26. 被引量:1
  • 5Pouchet L N, Bondhugula U, Bastoul C, et al. Loop transformations: Convexity, pruning and optimization [C // Proc of the 38th ACM SIGPLAN-SIGACT Symp on Principles of Programming Languages (POPL'll). New York: ACM, 2011:549-562. 被引量:1
  • 6Lain M S, Wolf M E. A data locality optimizing algorithm [C] //Proc of the 12th ACM SIGPLAN Conf on Programming LangUage Design and Implementation (PLDI'91). NewYork: ACM, 1991:30-44. 被引量:1
  • 7Lain M D, Rothberg E, Wolf M E. The cache performance and optimizations of blocked algorithms [C] //Proc of the 4th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 1991: 63-74. 被引量:1
  • 8Irigoin F, Triolet R. Supernode partitioning [C] //Proc of the 15th ACM SIGPLAN-SIGACT Syrup on Principles of Programming Languages ( POPL'88 ). New York: ACM, 1988:319-328. 被引量:1
  • 9Ancourt C, Irigoin F. Scanning polyhedra with DO loops [C] //Proc of the 3rd ACM SIGPLAN Syrup on Principles and Practice of Parallel Programming. New York: ACM, 1991: 39-50. 被引量:1
  • 10Xue Jingling. Loop Tiling for Parallelism [M]. Amsterdam, Netherlands: Kluwer Academic Publishers, 2000. 被引量:1

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部