摘要
针对数值计算中前缀和运算数据量大、耗时巨大这一难题,提出了一种基于开放式计算语言(open computing language,OpenCL)的分段式前缀和并行算法。首先进行了分段式前缀和算法的并行性分析,对任务进行了层次化分解与组合,设计了两级并行的分段式前缀和算法;然后通过OpenCL编程将前缀和并行算法映射到CPU+GPU系统平台上,实现了层次化并行前缀和处理;最后,根据计算单元(compute unit,CU)的资源条件,增加CU中本地存储器的分配,通过改进工作节点的访问模式来降低bank冲突,提高访存速度。实验结果表明,与基于AMD Opteron 2439 SE CPU的串行算法、基于OpenMP(open multi-processing)并行算法和基于统一计算设备架构并行算法性能相比,前缀和并行算法在OpenCL架构下NVIDIA Tesla C2075计算平台上分别获得了33.51倍、6.26倍和2.41倍的加速比。验证了提出的并行优化方法的有效性和性能可移植性。
Aiming at the problem of large amount of prefix sum computation data in numerical computation and huge time-consuming,a segmented prefix sum parallel algorithm based on the open computing language(OpenCL)is proposesd.First,the parallel analysis of segmented prefix sum algorithms was performed,and a two-level parallel segmented prefix sum algorithm was designed through the hierarchical decomposition and combination of processing tasks.Then the prefix sum parallel algorithm was mapped to the hardware platform of CPU+GPU and the hierarchical parallel processing of prefix sum was implemented by the OpenCL programming.Finally,according to the resource conditions of the compute unit(CU),the allocation of local memory was increased in CU.In addition,the bank conflict was reduced by improving the work-items access mode to increase the memory access speed.The experimental results showed that compared with the performance of the serial algorithm based on AMD Opteron 2439 SE CPU,parallel algorithm based on OpenMP(open multi-processing)and parallel algorithm based on compute unified device architecture(CUDA),the prefix sum parallel algorithm obtained 33.51 times,6.26 times and 2.41 times speedup in the NVIDIA Tesla C2075 computing platform under the OpenCL architecture respectively.The validity and performance portability of the proposed parallel optimization method are verified.
作者
肖汉
李彩林
郭宝云
周清雷
XIAO Han;LI Cai-lin;GUO Bao-yun;ZHOU Qing-lei(School of Information Science and Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Civil and Architectural Engineering,Shandong University of Technology,Zibo 255000,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出处
《科学技术与工程》
北大核心
2019年第31期215-221,共7页
Science Technology and Engineering
基金
国家自然科学基金(61572444、41601496、41701525)
山东省自然科学基金(ZR2017LD002)
山东省重点研发计划项目(2018GGX106002)资助
关键词
分段式前缀和
图形处理器
开放式计算语言
并行算法
性能优化
segmented prefix sum
graphic processing unit
open computing language
parallel algorithm
performance optimization