摘要
为了充分利用多核处理器的强大计算能力并满足具有高并行度应用的需求,提出一种基于大规模稀疏矩阵特征问题求解的并行共轭梯度算法.对图形处理器(GPU)上的计算,有效利用GPU多层次的存储器体系,采用线程与矩阵映射、数据合并访问、数据复用等优化手段,并通过高效的线程调度来隐藏全局存储器的高延迟访问;对Xeon Phi处理器上的计算,有效利用Xeon Phi的高并行度计算对数据通信/传递、减少数据依赖、向量化、异步计算等进行优化,并通过高效的线程调度来隐藏全局存储器的高延迟访问.文中还通过实验验证了算法的可行性和正确性,并对比了不同方式下的运行效率,发现共轭梯度法在GPU下比在Xeon Phi下的加速效果更好.
In order to harness the strong horsepower of multi-core processors and meet the demand of high parallelism,a new parallel conjugate gradient algorithm is proposed,which focuses on solving the linear equations of large-scale sparse matrices. For the GPU coprocessors,the memory hierarchy of GPU is effectively utilized,optimization methods,such as thread and matrix mappings,data merging and data multiplexing,are adopted,and an effective thread scheduling is conducted to hide the high latency of accessing the global memory of GPU. For Xeon Phi processors,the computation of high parallelism is effectively utilized to optimize data communication and transmission,data dependence reduction,vectorization and asynchronous computation,and effective thread scheduling is also conducted to hide the high latency of accessing global memory of GPU. Finally,the proposed algorithm is proved to be feasible and correct by tests on GPU and Xeon Phi,and its parallel efficiencies in two different ways are compared. It is found that the proposed algorithm on GPU has a better acceleration effect than itself on Xeon Phi.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第11期35-46,53,共13页
Journal of South China University of Technology(Natural Science Edition)
基金
广东省公益研究与能力建设专项(2014A040401018)
广东省促进科技服务业发展计划项目(2013B040404009)
广东省新媒体与品牌传播创新应用重点实验室资助项目(2013WSYS0002)
关键词
共轭梯度法
图形处理器
XEON
PHI
并行优化
稀疏矩阵向量乘
conjugate gradient method
graphics processing unit
Xeon Phi
parallel optimization
sparse matrix-vector multiplication