期刊文献+

基于OpenCL的图像重映射算法优化研究 被引量:3

Research on Image Remap Algorithm Optimization Based on OpenCL
原文传递
导出
摘要 图像重映射(Remap)算法是典型的图像变化算法。在图像放缩、扭曲、旋转等领域有着广泛的应用。随着图片规模和分辨率的不断提高,对图形映射算法的性能提出了越来越高的要求。本文在充分考虑不同GPU平台硬件体系结构差异的基础上,系统研究了在OpenCL框架下图像映射(Remap)算法在不NGPU平台上的高效实现方式。并从片外内存访存优化,向量化计算,减少动态指令等多个优化角度考察了不同优化方法在不同GPU平台上对性能的影响,提出了在不同GPU平台间实现性能移植的可能性。实验结果表明,优化后的算法在不考虑数据传输时间的前提下,在AMDHD5850GPU上相对于CPU版本取得114.3--491.5倍的加速比,相对于CUDA版本(现有GPU算法的实现)得到1.01~1.86的加速比,在NIVIDIAC2050GPU上相对CPU版苓取得100.7~369.8倍的加速比,相对于CUDA版本得到0.95~1.58的加速比。有效验证了本文提出的优化方法的有效性和胜能可移植性。 As a typical algorithm for image transformation, remap algorithm is widely used in image zooming, warping, rotating and some others. With continuous increase of image's scale and resolution, higher performance of graphic mapping algorithm has been more and more demanded. Taking full account of the differences of the hardware architectures on different GPU platforms, it is systematically studied in this paper that how remap algorithm based on OpenCL can run effectively on different GPU platforms. By applying memory access optimization of global memory, vectorization calculation, reducing judgments branch and some other optimization methods, we investigated the effects of different optimization on different platforms and suggested the possibility of realizing cross-platform portability. Experimental results showed that without counting the data transfer time, the speedup-ratio is 114.3-491.5 times for AMD HD5850 GPU to CPU version, and 1.01-1.86 times to CUDA version (with present GPU algorithm), and for NIVIDIA C2050 GPU, the speedup-ratio is 100.7-369.8 times to CPU and 0.95-1.58 times to CUDA. These well proved the validity and portability of the optimization methods proposed in this paper.
出处 《科研信息化技术与应用》 2013年第1期57-66,共10页 E-science Technology & Application
基金 国家自然科学基金资助项目(60303020 40806040) 国家自然科学基金资助重点项目(60533020) 国家自然科学基金青年基金项目(61100072)
关键词 OPENCL 通用计算 图像重映射算法 跨平台 OpenCL Parallel computing Image remap Cross-platform
  • 相关文献

参考文献12

  • 1Jianbin Fang, Ana Lucia Varbanescu,Henk Sips. AComprehensive Performance Comparison of CUDA andOpneCL [C]. International Conference Parallel Processing,2011,216-225. 被引量:1
  • 2OpenCV Wiki. http://opencv.willowgarage.com/wiki/,2012. 被引量:1
  • 3袁凤刚,刘建成.不同插值方法实现数字图像旋转研究[J].软件导刊,2010,9(4):187-189. 被引量:8
  • 4Khronos OpenCL Working Group. The OpenCLSpecification Version: 1.2. 被引量:1
  • 5颜深根,张云泉,龙国平,李炎.基于OpenCL的归约算法优化.软件学报,2011, 22(2): 163-171. 被引量:1
  • 6Herve CHEVANNE Dr. Ing. AMD. A Methodology ForOptimizing Data Transfer in OpenCL. 2011. 被引量:1
  • 7AMD Accelerated Parallel Processing OpenCL, 2012. 被引量:1
  • 8Haipeng Jia, Yunquan Zhang, Guoping Long, JianliangXu, Shengen. GPURoofline: A Model for GuidingPerformance Optimizations on GPUs. In proceedingof International European Conference on Parallel andDistributed Computing (EURO-PAR). Rhodes Island,Greece, 2012. 被引量:1
  • 9贾海鹏,张云泉,龙国平,徐建良,李焱.基于OpenCL的拉普拉斯图像增强算法优化研究[J].计算机科学,2012,39(5):271-277. 被引量:17
  • 10Haipeng Jia, Yunquan Zhang, Shengen Yan. An InsightfulProgram Performance Tuning Chain for GPU Computing.In proceeding of the 12th International Conference onAlgorithms and Architectures for Parallel Processing(ICA3PP-12). Fukuoka, Japan, 2012. 被引量:1

二级参考文献3

共引文献23

同被引文献13

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部