摘要
计算统一设备架构(CUDA)是通用计算领域的生力军,是世界最强计算机的引擎。但由于架构的特殊性,基于CUDA的程序必须进行专门的优化。为使编程人员了解CUDA程序的优化,从编程方法,存储器使用以及指令流优化等方面阐述CUDA程序优化措施的同时,结合一个实例进行了比较测试,测试结果显示经充分优化的程序比优化前快30倍。最后,给出了优化措施的参考排序。
Compute Unified Device Architecture(CUDA) is a vital new force in the domain of general purpose computing, is also the engine of the most power computer in the world. But because of the particularity of architecture, programs based on CUDA must be optimized specially. In order that programmers understand the optimization steps of CUDA program, the methods of CUDA program optimization are set forth from the aspects of program methods, using memory and optimizing instructions. At the same time, an instance is tested for comparing these methods. The results of tests show that the deeply optimized program runs faster 30 times than it has not optimized. At last, a reference sequence of the optimization methods is presented.
出处
《信息技术》
2011年第12期51-54,84,共5页
Information Technology
关键词
CUDA
程序
优化
信号处理
CUDA
program
optimization
signal processing