摘要
矩阵计算是科学计算中的重要应用,传统编译优化虽然能够大大提升规则矩阵乘法的性能,但对于不规则的矩阵乘法,即使加入编译优化也仅能获得领域专家1%的性能.提出基于模式的矩阵乘法编译优化,通过对矩阵乘法计算模式制定对应的优化策略,使形状规则和形状不规则的矩阵乘法都能取得良好的性能.在优化策略中引入数据布局重组优化是缩小不规则与规则矩阵乘法性能差距的关键,通过数据布局重组能够实现对不规则矩阵中数据元素的连续访问,从而提高数据的局部性.实验表明,基于模式的编译优化方法能够使规则和不规则矩阵乘法运算性能分别比商用编译器(icc)提高34%和43倍,且该方法具有良好的可扩展性.
Matrix computations play an important role in scientific computing. Traditional compiler optimizations can greatly improve the performance of the general matrix multiplication,however,for the special matrix multiplication(such as triangle matrix,banded matrix) the performance keeps still very poor even with deep compiler optimizations,i. e.,only 1% of the domain experts' handtuned performance. In this paper,we present a pattern-based compiler optimization methodology,which regards the matrix multiply as a pattern and defines a specialized optimization strategy for the pattern,which works both for general and special matrix multiplication. The key step of the optimization strategy is data layout re-organization,coupled with loop optimizations,i. e.,loop tiling,etc.Data layout optimization re-organizes the matrix data according to the memory access order to improve data locality. Experimental results show that our Pattern-based Compiler Optimization achieves near-peak performance for both general and special matrix multiplication,with 34% and 43X speedup over Intel's compiler(icc),and our approach exhibits good scalability.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第7期1518-1522,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60970023
61202055
61100011)资助
国家"八六三"高技术研究发展计划项目(2012AA010902
2012AA010901)资助
国家"九七三"重点基础研究发展计划项目(2011CB302501)资助
关键词
矩阵乘法
编译优化
数据布局重组
数据局部性
可扩展性
matrix multiplication
compiler optimization
data layout re-organization
data locality
scalability