期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations
1
作者 Mei WEN Da-fei HUANG +1 位作者 Chang-qing XUN Dong CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第11期899-916,共18页
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ... OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance. 展开更多
关键词 OpenCL Performance portability multi-core/many-core CPU Analysis-based transformation
原文传递
通用处理器加速器研究综述 被引量:1
2
作者 陆祎 卜国强 《计算机应用与软件》 CSCD 北大核心 2013年第8期4-8,共5页
加速器是一种辅助通用处理器实现某些特定应用高效处理的硬件部件,可以用来解决当前通用处理器设计过程中存在的性能瓶颈问题。目前主流的加速器研究主要包括高效的加速器设计和有效地与通用处理器协作两个方面,这些研究对扩展加速器的... 加速器是一种辅助通用处理器实现某些特定应用高效处理的硬件部件,可以用来解决当前通用处理器设计过程中存在的性能瓶颈问题。目前主流的加速器研究主要包括高效的加速器设计和有效地与通用处理器协作两个方面,这些研究对扩展加速器的应用领域和更有效地利用加速器所提供的计算资源来提升应用程序的性能有着十分重要的意义。对当前加速器研究领域中的热点问题进行调研和归纳,在对各个研究项目进行分析和评价的基础上,也对加速器可能的发展方向进行展望。 展开更多
关键词 加速器 可编程化 多核 众核化 数据通信优化
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部