CUDA程序到Cell平台的源代码移植

Source Code Migration of CUDA Program to Cell Platform

下载PDF

导出

摘要相对于传统的串行程序移植,并行系统间的代码移植因体系结构间的巨大差异而变得极为复杂。为此,针对统一计算设备架构(CUDA)程序向其他异构多核平台的移植,提出CUDA架构到Cell的映射方案。通过模型映射、并行粒度提升、共享变量清除和运行时优化,使CUDA程序的大规模并行线程可以在Cell平台上正确执行。实验结果证明,翻译后的程序在Cell的执行效率可达到Cell平台上手动编写程序的72%。 Compared with traditional serial program migration,parallel program migration becomes sophisticated for the huge diversity of different architectures.To migrate Compute Unified Device Architecture（CUDA） programs to other heterogeneous multi-cores,a method of mapping CUDA architecture to Cell is proposed.Through executing model mapping,enhancing parallel granularity,memory mapping and optimization,the mass threads in CUDA can execute correctly in Cell architecture by source code migration.Experimental result shows the executing speed of translated programs can achieve 72% of native compiled programs.

作者岳峰庞建民张一弛余勇

机构地区解放军信息工程大学信息工程学院

出处《计算机工程》 CAS CSCD 2012年第24期279-282,共4页 Computer Engineering

基金国家"863"计划基金资助项目(2009AA012201) "核高基"重大专项(2009ZX01036-001-001) 河南省重大科技攻关计划基金资助项目(092101210501)

关键词源代码移植异构多核模型映射共享变量清除运行时优化 source code migration heterogeneous multi-core model mapping shared variable removing runtime optimization

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1NVIDIA. NVIDIA CUDA Compute Unified Device Archite- cture[Z]. 2nd ed. 2008. 被引量：1
2张舒,禇艳利主编..GPU高性能运算之CUDA[M],2009:276.
3Kahle J A, Day M N, Hofstee H P, et al. Introduction to the Cell Multiprocessor[J]. IBM Journal of Research and Development, 2005, 49(4): 589-604. 被引量：1
4Stratton J A, Stone S S. MCUDA: An Effective Implementation of CUDA Kernels for Multi-core CPUs[C]//Proc. of the 21st International Workshop on Languages and Compilers for Parallel Computing. Berlin, Germany: Springer, 2008: 16-30. 被引量：1
5Diamos G, Kerr A, Kesavan M. A Dynamic Optimization Framework for Bulk-synchronous Applications in Heterogeneous System[C]//Proc. of the 19th Conference on Parallel Architectures and Compilation Techniques. Vienna, Austria: IEEE Press, 2010. 被引量：1
6Collange S, Defour D, Parello D. Barra: A Modular Functional GPU Simulator for GPGPU[EB/OL]. (2009-09-24). http://hal.archi ves-ouvertes.fr/hal-00359342. 被引量：1
7A Parallel Functional Simulator for GPGPU[EB/OL]. (2011-11-11 ). http://doi.ieeecomputersociety.org/10.1109/MASCOTS.2010.43. 被引量：1
8Analyzing CUDA Workloads Using a Detailed GPU Simula- tor[EB/OL]. (2011-11-11). http://ieeexplore.ieee.org/xpls/abs_all. j sp?arnumber=4919648. 被引量：1
9Allen R,Kennedy.T 现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等,译.北京:机械工业出版社,2004. 被引量：1
10The Parboil Benchmark Suite[EB/OL]. (2007-06-21). http://www. crhc.uiuc.edu/IMPACT/parboil.php. 被引量：1

1黄春,杨学军.基于值-剖面的OpenMP运行时优化系统[J].计算机工程与科学,2006,28(12):124-128.
2邓培智.CUDA编程模型[J].程序员,2008(5):84-85. 被引量：3
3杨云生,张朝晖.基于计算统一设备架构的程序优化研究[J].信息技术,2011(12):51-54.
4李慧霸,刘盛云,彭宇行,李东升,周航军,卢锡城.超标量通信:一种面向分布式应用的运行时优化技术[J].中国科学：信息科学,2010,40(12):1559-1574.
5朱晓珺,李冬梅.C/C^(++)程序的运行时优化研究[J].软件导刊,2009,8(4):60-62. 被引量：1
6周威,姚建华.初探通讯带宽和延迟对CUDA程序的影响[J].高性能计算技术,2010,0(5):55-59.
7潘志宏,余志武.Fortran有限元程序移植及集成数据库管理的混合编程实现[J].长沙铁道学院学报,2003,21(4):47-52.
8刘国升,黄飞林,芮合群.水利工程中文DOS应用软件资源的再利用方案[J].中国水运（下半月）,2008,0(8):158-159.
9郭振宇,刘利,陈彧,汤志忠.减小运行时优化开销的方法[J].计算机工程,2006,32(24):63-65. 被引量：1
10蒋荣金,尚群立,余善恩.EPA工业测控网络的时钟同步分析与改进[J].机电工程,2009,26(1):17-20. 被引量：2

计算机工程

2012年第24期

浏览历史

内容加载中请稍等...

CUDA程序到Cell平台的源代码移植

参考文献10

相关作者

相关机构

相关主题

浏览历史