摘要
GPU的高性价比吸引了越来越多的通用计算.为充分发挥异构处理平台下GPU的通用计算能力,提出面向OpenCL模型的性能优化方法.该方法建立源程序的多面体表示,分别对GPU的全局存储器和快速存储器进行优化与分配;通过检测存储访问模式发掘可向量化的存储访问实例,利用数据空间变换对存储访问模式进行转换,进而使用向量数据类型提高片外存储器的带宽利用率;通过检测程序中的数据重用,根据数据的访问属性和OpenCL存储模型的特性实现快速存储器的有效分配与优化,提高了片上存储器的使用效率.采用文中方法对6个测试程序进行实验的结果表明,程序的性能提高了1.6~8.4倍,证实了其有效性.
Graphic processing units attract more and more general-purpose computing due to high performance/cost ratio. In order to fully exploit the capability of GPU for general-purpose computing under heterogeneous processing platforms, this paper proposes performance optimization methods targeting OpenCL model. Polyhedral representation of a source program is built to optimize and allocate GPU memory system. By checking memory access patterns of the source program, access instances those can be grouped together are discovered by means of graph coloring. Subsequently, data space transformation is utilized to alter irregular memory access patterns for the sake of improving the off-chip memory bandwidth by taking advantage of vector data types. Meanwhile, data reuse information is detected to allocate data into distinct fast memory regions according to both the properties of data accesses and the characteristics of the OpenCL memory model, with the purpose of making best usage of the fast on-chip memory. Experimental results on benchmarks showed that the optimized programs achieved a speedup of 1.6X-8.4X in comparison with the un-optimized versions, demonstrated the effectiveness of the proposed methods.
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2011年第4期571-581,共11页
Journal of Computer-Aided Design & Computer Graphics
基金
上海市重点学科建设项目基金(B114)
AMD大学合作计划基金
关键词
OPENCL
GPU
性能优化
异构处理
通用计算
多面体表示
OpenCL
GPU
performance optimization
heterogeneous processing
general-purpose eomputing
polyhedral representation