期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
通用图形处理器功耗估算模型 被引量:2
1
作者 王吉军 程华 《计算机工程》 CAS CSCD 北大核心 2017年第2期92-97,104,共7页
为精准快速地获得GPU功耗数据,提出一种基于硬件性能计数事件的通用图形处理器(GPGPU)功耗估算方法。通过分析GPGPU程序运行时的功耗分布情况,选择一组与应用程序运行功耗密切相关的硬件性能计数事件集合,使用反向传播人工神经网络分析... 为精准快速地获得GPU功耗数据,提出一种基于硬件性能计数事件的通用图形处理器(GPGPU)功耗估算方法。通过分析GPGPU程序运行时的功耗分布情况,选择一组与应用程序运行功耗密切相关的硬件性能计数事件集合,使用反向传播人工神经网络分析硬件性能计数事件与实时功耗间的关系,最终建立GPGPU功耗估算模型。实验结果表明,与多元线性回归的功耗估算模型相比,该模型具有更高的估算准确性和通用性。 展开更多
关键词 通用图形处理器 硬件性能计数事件 反向传播人工神经网络 交叉验证 功耗估算
下载PDF
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the First Kind Boundary Conditions in Capillary Porous Radially Composite Cylinder Using Programmable Graphics Hardware
2
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2019年第7期267-281,共15页
With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer ... With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more necessary in analyzing a number of eventualities in science and engineering applications. However, this procedure of numerical solution of heat and mass transfer equations for capillary porous media is very time consuming. Therefore, this paper pursuit is at making use of one of the acceleration methods developed in the graphics community that exploits a graphical processing unit (GPU), which is applied to the numerical solutions of such heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model offers a correct approach of applying parallel computing to applications with graphical processing unit. This paper suggests a true improvement in the performance while solving the heat and mass transfer equations for capillary porous radially composite cylinder with the first type of boundary conditions. This heat and mass transfer simulation is carried out through the usage of CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental outcomes exhibit the drastic overall performance enhancement when GPU is used to illustrate heat and mass transfer simulation. GPU can considerably accelerate the performance with a maximum found speedup of more than 5-fold times. Therefore, the GPU is a good strategy to accelerate the heat and mass transfer simulation in porous media. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose graphics processing unit (gpgpu) CUDA
下载PDF
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the Second Kind Boundary Conditions in Capillary Porous Composite Cylinder Using Programmable Graphics Hardware
3
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2018年第9期24-38,共15页
With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass tr... With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more important in analysing various scenarios in engineering applications. Analysing such heat and mass transfer phenomenon in a given environment requires us to simulate it. This entails simulation of coupled heat mass transfer equations. However, this process of numerical solution of heat and mass transfer equations is very much time consuming. Therefore, this paper aims at utilizing one of the acceleration techniques developed in the graphics community that exploits a graphics processing unit (GPU) which is applied to the numerical solutions of heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model caters a good method of applying parallel computing to program the graphical processing unit. This paper shows a good improvement in the performance while solving the heat and mass transfer equations for capillary porous composite cylinder with the second kind of boundary conditions numerically running on GPU. This heat and mass transfer simulation is implemented using CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental results depict the drastic performance improvement when GPU is used to perform heat and mass transfer simulation. GPU can significantly accelerate the performance with a maximum observed speedup of more than 7-fold times. Therefore, the GPU is a good approach to accelerate the heat and mass transfer simulation. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose graphics processing unit (gpgpu) CUDA
下载PDF
基于2阶段同步的GPGPU线程块压缩调度方法 被引量:1
4
作者 张军 何炎祥 +2 位作者 沈凡凡 江南 李清安 《计算机研究与发展》 EI CSCD 北大核心 2016年第6期1173-1185,共13页
通用图形处理器(general purpose graphics processing unit,GPGPU)在面向高性能计算、高吞吐量的通用计算领域的应用日益广泛,它采用的SIMD(single instruction multiple data)执行模式使其能获得强大的并行计算能力.目前主流的通用图... 通用图形处理器(general purpose graphics processing unit,GPGPU)在面向高性能计算、高吞吐量的通用计算领域的应用日益广泛,它采用的SIMD(single instruction multiple data)执行模式使其能获得强大的并行计算能力.目前主流的通用图形处理器均通过大量高度并行的线程完成计算任务的高效执行.但是在处理条件分支转移的控制流中,由于通用图形处理器采用串行的方式顺序处理不同的分支路径,使得其并行计算能力受到影响.在分析讨论前人针对分支转移处理低效的线程块压缩重组调度方法的基础上,提出了2阶段同步的线程块压缩重组调度方法 TSTBC(two-stage synchronization based thread block compaction scheduling),通过线程块压缩重组适合性判断逻辑部件,分2个阶段对线程块进行压缩重组有效性分析,进一步减少了无效的线程块压缩重组次数.模拟实验结果表明:该方法较好地提高了线程块的压缩重组有效性,相对于其他同类方法降低了对线程组内部数据局部性的破坏,并使得片上一级数据cache的访问失效率得到有效降低;相对于基准体系结构,系统性能提升了19.27%. 展开更多
关键词 通用图形处理器 线程调度 线程块压缩重组 2阶段同步 分支转移
下载PDF
基于GPGPU的准实时测频技术
5
作者 张朝晖 於建生 +1 位作者 薛钰娟 徐勤建 《雷达科学与技术》 2011年第2期183-187,共5页
简要介绍了GPGPU技术及CUDA编程架构,并在CUDA技术的基础上应用现代信号处理的方法实现了对超高速采样信号的准实时数字测频算法。仿真表明算法内核的计算延时很小;通过对现场1GHz超高速采样实际信号数据的验证,证明该技术能够满足准实... 简要介绍了GPGPU技术及CUDA编程架构,并在CUDA技术的基础上应用现代信号处理的方法实现了对超高速采样信号的准实时数字测频算法。仿真表明算法内核的计算延时很小;通过对现场1GHz超高速采样实际信号数据的验证,证明该技术能够满足准实时测量频率和其他脉冲参数的要求,同时对每一脉冲还可给出测频参考误差;并且在典型信噪比下,测频精度远高于模拟测频接收机。该实现与其他实现方案相比,灵活性更好,性价比更高,具有良好的应用前景。 展开更多
关键词 通用图形处理器(gpgpu) 准实时 数字测频 超高速采样
下载PDF
基于GPU通用计算的并行算法和计算框架的实现 被引量:3
6
作者 朱宇兰 《山东农业大学学报(自然科学版)》 CSCD 2016年第3期473-476,480,共5页
GPU通用计算是近几年来迅速发展的一个计算领域,以其强大的并行处理能力为密集数据单指令型计算提供了一个绝佳的解决方案,但受限制于芯片的制造工艺,其运算能力遭遇瓶颈。本文从GPU通用计算的基础——图形API开始,分析GPU并行算法特征... GPU通用计算是近几年来迅速发展的一个计算领域,以其强大的并行处理能力为密集数据单指令型计算提供了一个绝佳的解决方案,但受限制于芯片的制造工艺,其运算能力遭遇瓶颈。本文从GPU通用计算的基础——图形API开始,分析GPU并行算法特征、运算的过程及特点,并抽象出了一套并行计算框架。通过计算密集行案例,演示了框架的使用方法,并与传统GPU通用计算的实现方法比较,证明了本框架具有代码精简、与图形学无关的特点。 展开更多
关键词 GPU通用计算 并行计算 计算框架
下载PDF
A multi-scale architecture for multi-scale simulation and its application to gas-solid flows 被引量:1
7
作者 Bo Li Guofeng Zhou +4 位作者 Wei Ge Limin Wang Xiaowei Wang Li Guo Jinghai Li 《Particuology》 SCIE EI CAS CSCD 2014年第4期160-169,共10页
A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpos... A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpose CPUs are employed for macro-scale control and optimization, and many integrated cores (MlCs) operating in multiple-instruction multiple-data mode are used for a molecular dynamics simulation of the solid particles at the meso-scale. Many cores operating in single-instruction multiple- data mode, such as general purpose graphics processing units (GPGPUs), are employed for direct numerical simulation of the fluid flow at the micro-scale using the lattice Boltzmann method. This architecture is also expected to be efficient for the multi-scale simulation of other comolex systems. 展开更多
关键词 general purpose graphics processing unitgpgpu)Many integrated core (MIC)Meso-science Multiple-instruction multiple-dataSingle-instruction multiple-dataVirtual process engineering
原文传递
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
8
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(gpgpu) MULTICORE intra-kernel inter-kernel parallel
原文传递
一种基于深度纹理的碰撞检测
9
作者 黄鹏 孟明 《计算机应用与软件》 CSCD 北大核心 2013年第1期270-272,293,共4页
提出一种基于GPU的图像空间碰撞检测方法。利用OpenGL的深度纹理技术和其扩展功能帧缓存对象(FBO)实现将深度值直接渲染至纹理,然后采用通用计算图形处理器(GPGPU)技术对纹理中的数据进行处理实现碰撞检测。仿真实验显示,该方法充分利... 提出一种基于GPU的图像空间碰撞检测方法。利用OpenGL的深度纹理技术和其扩展功能帧缓存对象(FBO)实现将深度值直接渲染至纹理,然后采用通用计算图形处理器(GPGPU)技术对纹理中的数据进行处理实现碰撞检测。仿真实验显示,该方法充分利用了图形硬件GPU的性能,并能够正确地返回碰撞检测结果。 展开更多
关键词 碰撞检测 深度纹理 FBO gpgpu
下载PDF
面向GPU并行编程的线程同步综述
10
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(gpgpu) 并行编程 线程同步 性能优化
下载PDF
大规模稀疏矩阵的主特征向量计算优化方法 被引量:3
11
作者 王伟 陈建平 +2 位作者 曾国荪 俞莉花 谭一鸣 《计算机科学与探索》 CSCD 2012年第2期118-124,共7页
矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图... 矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。 展开更多
关键词 图形处理单元通用计算(gpgpu) 主特征向量计算 稀疏矩阵向量乘 线程优化
下载PDF
Fast OBJ file importing and parsing in CUDA 被引量:2
12
作者 Aidan L.Possemiers Ickjai Lee 《Computational Visual Media》 2015年第3期229-238,共10页
Alias – Wavefront OBJ meshes are a common text file type for transferring 3D mesh data between applications made by different vendors.However, as the mesh complexity gets higher and denser, the files become larger an... Alias – Wavefront OBJ meshes are a common text file type for transferring 3D mesh data between applications made by different vendors.However, as the mesh complexity gets higher and denser, the files become larger and slower to import.This paper explores the use of GPUs to accelerate the importing and parsing of OBJ files by studying file read-time, runtime, and load resistance. We propose a new method of reading and parsing that circumvents GPU architecture limitations and improves performance, seeing the new GPU method outperforms CPU methods with a 6×– 8× speedup. When running on a heavily loaded system, the new method only received an 80% performance hit, compared to the160% that the CPU methods received. The loaded GPU speedup compared to unloaded CPU methods was3.5×, and, when compared to loaded CPU methods,8×. These results demonstrate that the time is right for further research into the use of data-parallel GPU acceleration beyond that of computer graphics and high performance computing. 展开更多
关键词 PARSING OBJ vertex buffer object(VBO) general-purpose programming on the graphics processing unit(gpgpu) compute unified device architecture(CUDA)
原文传递
位图连接索引服务机制研究
13
作者 张延松 苏明川 +1 位作者 张宇 王方舟 《计算机工程与应用》 CSCD 北大核心 2015年第5期107-115,共9页
位图连接索引是数据仓库中一种有效的优化表间连接操作性能的索引机制。在大内存分析处理应用场景下,位图连接索引不仅需要权衡索引的内存和CPU开销,还需要进一步考虑处理器平台所带来的性能收益和数据访问延迟。提出了基于服务的位图... 位图连接索引是数据仓库中一种有效的优化表间连接操作性能的索引机制。在大内存分析处理应用场景下,位图连接索引不仅需要权衡索引的内存和CPU开销,还需要进一步考虑处理器平台所带来的性能收益和数据访问延迟。提出了基于服务的位图连接索引管理机制,其主要特点体现在三个方面:独立于数据库的自管理索引机制;基于存储空间约束的TOP K关键字位图连接索引机制;处理器敏感(processor-conscious)的位图连接索引技术。索引服务将索引从数据库中内置的数据结构变成数据库外的索引服务层,通过对用户查询负载的分析模块和索引服务管理模块改变传统的由数据库管理员人工管理索引的模式,同时借助于协处理器和内存云技术提高索引服务的性能和灵活性。实验测试结果表明,索引服务机制能够有效地提高索引存储和访问效率,在通用GPU的强大并行处理能力的支持下,位图连接索引服务的性能和数据库整体查询处理性能都得到了显著的提升。 展开更多
关键词 位图连接索引 通用图形处理器(gpgpu) 关键字位图连接索引 处理器敏感位图连接索引
下载PDF
一种高效的压缩Page Walk Cache结构
14
作者 贾朝阳 张敦博 +1 位作者 王琼 沈立 《计算机工程与科学》 CSCD 北大核心 2020年第9期1521-1528,共8页
通用图形处理单元(GPGPU)已被广泛应用于现代高性能计算系统中。GPGPU的单指令多线程执行模型导致快表命中率较低,特别是对于那些不规则应用,需要借助PWC减少实际的页表访问次数。传统PWC中存在很多冗余信息,加之容量有限,实际效果并不... 通用图形处理单元(GPGPU)已被广泛应用于现代高性能计算系统中。GPGPU的单指令多线程执行模型导致快表命中率较低,特别是对于那些不规则应用,需要借助PWC减少实际的页表访问次数。传统PWC中存在很多冗余信息,加之容量有限,实际效果并不理想。分析了传统PWC中的信息冗余情况,提出了一种新结构——压缩PWC。压缩PWC在保证查找开销不变的基础上,完全消除了冗余信息,压缩了空间,使PWC能够记录更多的页表访问历史,从而有效减少地址转换过程中访问页表的次数。测试结果表明,与相同容量的传统PWC相比,压缩PWC可以显著缩短虚实地址转换时间开销。 展开更多
关键词 通用图形处理器 虚实地址转换 页表遍历缓存
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部