摘要
Jacobi迭代算法是一种求解偏微分方程组的常用循环运算.由于该算法存在语句间的数据相关,阻碍了其在图像处理单元(Graphic Processing Unit,GPU)等并行计算平台的高效实现.通过数学证明与实验验证,比较不同的循环优化策略,消除语句间数据相关,增强数据局部性,从而获得更高的执行性能.此外,利用块(Tile)大小选取模型,合理的划分计算数据,充分利用GPU的运算资源,进一步提高性能.实验结果表明,Jacobi奇偶复制算法比传统Jacobi并行算法在GPU上的性能提高4倍以上.
Jacobi iteration method is an inherently iterative loop computation solving Partial Differential Equations. However, the pres- ence of data dependences in Jacobi loop nest poses an obstacle to its paralleled execution on the state-of-the-art parallel platform, Graphics Processing Unit ( GPU ). Analysis of mathematic and experiment assist to compare various loop optimizing strategies, which eliminate data dependence, significantly enhance Jacobi algorithm's locality, utilize latency-free characteristic of shared memory, and largely exploit GPU's potential on accelerating Jacobi algorithm. Moreover, efficient tile size selection model helps to appropriately map computation to GPU and substantially utilize its computation resources for higher performance. Experimental result demonstrates the odd-even duplication algorithm has over four times higher speedups than traditional Jacobi parallel algorithm on GPU.
出处
《小型微型计算机系统》
CSCD
北大核心
2012年第9期1962-1967,共6页
Journal of Chinese Computer Systems
基金
教育部科学技术研究重点项目(108008)资助
国家"八六三"高技术研究发展计划项目(2008AA01Z109)资助