期刊文献+

矩阵形状无关的并行编译优化 被引量:2

Shape-oblivious Parallel Compiler Optimization for Matrix Computations
下载PDF
导出
摘要 矩阵计算是科学计算中的重要应用,传统编译优化虽然能够大大提升规则矩阵乘法的性能,但对于不规则的矩阵乘法,即使加入编译优化也仅能获得领域专家1%的性能.提出基于模式的矩阵乘法编译优化,通过对矩阵乘法计算模式制定对应的优化策略,使形状规则和形状不规则的矩阵乘法都能取得良好的性能.在优化策略中引入数据布局重组优化是缩小不规则与规则矩阵乘法性能差距的关键,通过数据布局重组能够实现对不规则矩阵中数据元素的连续访问,从而提高数据的局部性.实验表明,基于模式的编译优化方法能够使规则和不规则矩阵乘法运算性能分别比商用编译器(icc)提高34%和43倍,且该方法具有良好的可扩展性. Matrix computations play an important role in scientific computing. Traditional compiler optimizations can greatly improve the performance of the general matrix multiplication,however,for the special matrix multiplication(such as triangle matrix,banded matrix) the performance keeps still very poor even with deep compiler optimizations,i. e.,only 1% of the domain experts' handtuned performance. In this paper,we present a pattern-based compiler optimization methodology,which regards the matrix multiply as a pattern and defines a specialized optimization strategy for the pattern,which works both for general and special matrix multiplication. The key step of the optimization strategy is data layout re-organization,coupled with loop optimizations,i. e.,loop tiling,etc.Data layout optimization re-organizes the matrix data according to the memory access order to improve data locality. Experimental results show that our Pattern-based Compiler Optimization achieves near-peak performance for both general and special matrix multiplication,with 34% and 43X speedup over Intel's compiler(icc),and our approach exhibits good scalability.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第7期1518-1522,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60970023 61202055 61100011)资助 国家"八六三"高技术研究发展计划项目(2012AA010902 2012AA010901)资助 国家"九七三"重点基础研究发展计划项目(2011CB302501)资助
关键词 矩阵乘法 编译优化 数据布局重组 数据局部性 可扩展性 matrix multiplication compiler optimization data layout re-organization data locality scalability
  • 相关文献

参考文献5

二级参考文献34

  • 1陈国良.并行算法的可扩放性分析[J].小型微型计算机系统,1995,16(2):10-16. 被引量:12
  • 2Whaley R C, Petitet A, Dongarra J J. Automated Empirical Optimization of Software and the ATLAS Project [J].Parallel Computing, 2001,27(1-2) : 3-35. 被引量:1
  • 3Bilmes J, Asanovie K, et al. Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Cod ing Methodology[C]//Proc of Int'l Conf on Supereomputing, 1997:340-347. 被引量:1
  • 4Yotov K, Li Xiaoming, et al. A Comparison of Empirical and Model-driven Optimization[C]//Proc of the ACM SIGPLAN'03 C.onf on Programming Language Design and Implementation, 2003:63-76. 被引量:1
  • 5Panda P R,Nakamura H. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance[J]. IEEE Trans on Computers, 1999, 48(2) :142-149. 被引量:1
  • 6Li Zhiyuan, Song Yonghong. Automatic Tiling of Iterative Steneil Loops[J]. ACM Trans on Programming Languages and Systems, 2004,26 (6):975-1028. 被引量:1
  • 7Holland J H. Adaptation in Natural and Artificial Systems [M]. University of Michigan Press, 1975. 被引量:1
  • 8Dean J, Hicks J E, Waldspurger C A, et al. Chrysos. Profileme: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors[C]//Proc of Int'l Syrup on Microarchitecture, 1997: 292-302. 被引量:1
  • 9Browne S, Dongarra J, et al. A Portable Programming Interface for Performance Evaluation on Modem Processors[J]. International Journal of High Performance Computing Appli cations, 2000,14(3) : 189-204. 被引量:1
  • 10Gunnels J A,Henry G M,Van de Geijn R A,et al.A family ofhigh-performance matrix multiplication algorithms[C].In Pro-ceedings of the International Conference on Computational Science(ICCS'01),Part I,V.N.Alexandrov,J.J.Dongarra,B.A.Juliano,R.S.Renner,and C.K.Tan,Eds.Lecture Notes inComputer Science,Springer-Verlag,2073:51-60. 被引量:1

共引文献9

同被引文献8

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部