期刊文献+
共找到1,430篇文章
< 1 2 72 >
每页显示 20 50 100
KD-90普及型个人高性能计算机系统设计与性能优化 被引量:8
1
作者 蔡晔 刘刚 +2 位作者 毛睿 罗秋明 陈国良 《深圳大学学报(理工版)》 EI CAS 北大核心 2013年第2期138-143,共6页
报道中国首台采用自主设计研制的龙芯3B 8核处理器的万亿次高性能计算机系统KD-90.该系统具有高计算密度、低功耗、低成本、低占地的特点,其应用SMP→CC-NUMA→Cluster 3级并行体系结构,采用通用协议与专用协议结合的互连网络硬件设计,... 报道中国首台采用自主设计研制的龙芯3B 8核处理器的万亿次高性能计算机系统KD-90.该系统具有高计算密度、低功耗、低成本、低占地的特点,其应用SMP→CC-NUMA→Cluster 3级并行体系结构,采用通用协议与专用协议结合的互连网络硬件设计,实现了CC-NUMA机群架构关键技术的突破;应用矢量部件加速技术实现了一种通用处理器与向量协处理器相结合的编程模型.结合体系结构特点和操作系统内核对系统性能优化并进行了性能测试和分析. 展开更多
关键词 计算机工程 个人高性能计算机系统 龙芯 并行体系结构 高性能计算
下载PDF
High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
2
作者 Azam Fazel-Najafabadi Mahdi Abbasi +5 位作者 Hani H.Attar Ayman Amer Amir Taherkordi Azad Shokrollahi Mohammad R.Khosravi Ahmed A.Solyman 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第4期1118-1137,共20页
The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific f... The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data. 展开更多
关键词 OPENMP compute Unified Device architecture(CUDA) Message Passing Interface(MPI) packet classification medical data tuple space algorithm Graphics Processing Unit(GPU)cluster
原文传递
High-precision parallel computing model of solute transport based on GPU acceleration
3
作者 Shang-hong Zhang Rong-qi Zhang +2 位作者 Wen-da Li Xi-yan Yang Yang Zhou 《Journal of Hydrodynamics》 SCIE EI CSCD 2024年第1期202-212,共11页
The scenario simulation analysis of water environmental emergencies is very important for risk prevention and control,and emergency response.To quickly and accurately simulate the transport and diffusion process of hi... The scenario simulation analysis of water environmental emergencies is very important for risk prevention and control,and emergency response.To quickly and accurately simulate the transport and diffusion process of high-intensity pollutants during sudden environmental water pollution events,in this study,a high-precision pollution transport and diffusion model for unstructured grids based on Compute Unified Device Architecture(CUDA)is proposed.The finite volume method of a total variation diminishing limiter with the Kong proposed r-factor is used to reduce numerical diffusion and oscillation errors in the simulation of pollutants under sharp concentration conditions,and graphics processing unit acceleration technology is used to improve computational efficiency.The advection diffusion process of the model is verified numerically using two benchmark cases,and the efficiency of the model is evaluated using an engineering example.The results demonstrate that the model perform well in the simulation of material transport in the presence of sharp concentration.Additionally,it has high computational efficiency.The acceleration ratio is 46 times the single-thread acceleration effect of the original model.The efficiency of the accelerated model meet the requirements of an engineering application,and the rapid early warning and assessment of water pollution accidents is achieved. 展开更多
关键词 Pollution transport and diffusion model parallel computing compute Unified Device architecture(CUDA) pollution event
原文传递
HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
4
作者 郭家栋 彭靖姝 +1 位作者 苑航 倪明选 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期3-24,共22页
A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the succ... A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing frameworks.However,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and TA-Lib.In this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series data.HXPY supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment functions.The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts.From MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times. 展开更多
关键词 dataframe time-series data SIMD(single instruction multiple data) CUDA(compute Unified Device architecture)
原文传递
A GPU-Accelerated Discontinuous Galerkin Method for Solving Two-Dimensional Laminar Flows 被引量:2
5
作者 GAO Huanqin CHEN Hongquan +2 位作者 ZHANG Jiale XU Shengguan GAO Yukun 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2022年第4期450-466,共17页
A graphics processing unit(GPU)-accelerated discontinuous Galerkin(DG)method is presented for solving two-dimensional laminar flows.The DG method is ported from central processing unit to GPU in a way of achieving GPU... A graphics processing unit(GPU)-accelerated discontinuous Galerkin(DG)method is presented for solving two-dimensional laminar flows.The DG method is ported from central processing unit to GPU in a way of achieving GPU speedup through programming under the compute unified device architecture(CUDA)model.The CUDA kernel subroutines are designed to meet with the requirement of high order computing of DG method.The corresponding data structures are constructed in component-wised manners and the thread hierarchy is manipulated in cell-wised or edge-wised manners associated with related integrals involved in solving laminar Navier-Stokes equations,in which the inviscid and viscous flux terms are computed by the local lax-Friedrichs scheme and the second scheme of Bassi&Rebay,respectively.A strong stability preserving Runge-Kutta scheme is then used for time marching of numerical solutions.The resulting GPU-accelerated DG method is first validated by the traditional Couette flow problems with different mesh sizes associated with different orders of approximation,which shows that the orders of convergence,as expected,can be achieved.The numerical simulations of the typical flows over a circular cylinder or a NACA 0012 airfoil are then carried out,and the results are further compared with the analytical solutions or available experimental and numerical values reported in the literature,as well as with a performance analysis of the developed code in terms of GPU speedups.This shows that the costs of computing time of the presented test cases are significantly reduced without losing accuracy,while impressive speedups up to 69.7 times are achieved by the present method in comparison to its CPU counterpart. 展开更多
关键词 discontinuous Galerkin GPU compute unified device architecture(CUDA) Navier-Stokes equation laminar flows
下载PDF
SOLVERS FOR SYSTEMS OF LARGE SPARSE LINEAR AND NONLINEAR EQUATIONS BASED ON MULTI-GPUS 被引量:3
6
作者 刘沙 钟诚文 陈效鹏 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2011年第3期300-308,共9页
Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremend... Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need. 展开更多
关键词 general purpose graphic process unit(GPGPU) compute unified device architecture(CUDA) system of linear equations system of nonlinear equations Inexact Newton method bi-conjugate gradient stable(Bi-CGstab)method
下载PDF
High-performance solutions of geographically weighted regression in R 被引量:1
7
作者 Binbin Lu Yigong Hu +4 位作者 Daisuke Murakami Chris Brunsdon Alexis Comber Martin Charlton Paul Harris 《Geo-Spatial Information Science》 SCIE EI CSCD 2022年第4期536-549,共14页
As an established spatial analytical tool,Geographically Weighted Regression(GWR)has been applied across a variety of disciplines.However,its usage can be challenging for large datasets,which are increasingly prevalen... As an established spatial analytical tool,Geographically Weighted Regression(GWR)has been applied across a variety of disciplines.However,its usage can be challenging for large datasets,which are increasingly prevalent in today’s digital world.In this study,we propose two high-performance R solutions for GWR via Multi-core Parallel(MP)and Compute Unified Device Architecture(CUDA)techniques,respectively GWR-MP and GWR-CUDA.We compared GWR-MP and GWR-CUDA with three existing solutions available in Geographically Weighted Models(GWmodel),Multi-scale GWR(MGWR)and Fast GWR(FastGWR).Results showed that all five solutions perform differently across varying sample sizes,with no single solution a clear winner in terms of computational efficiency.Specifically,solutions given in GWmodel and MGWR provided acceptable computational costs for GWR studies with a relatively small sample size.For a large sample size,GWR-MP and FastGWR provided coherent solutions on a Personal Computer(PC)with a common multi-core configuration,GWR-MP provided more efficient computing capacity for each core or thread than FastGWR.For cases when the sample size was very large,and for these cases only,GWR-CUDA provided the most efficient solution,but should note its I/O cost with small samples.In summary,GWR-MP and GWR-CUDA provided complementary high-performance R solutions to existing ones,where for certain data-rich GWR studies,they should be preferred. 展开更多
关键词 Non-stationarity big data parallel computing compute Unified Device architecture(CUDA) Geographically Weighted models(GWmodel)
原文传递
基于CUDA的并行布谷鸟搜索算法设计与实现 被引量:2
8
作者 韦向远 杨辉华 谢谱模 《计算机科学与探索》 CSCD 2014年第6期665-673,共9页
布谷鸟搜索(cuckoo search,CS)算法是近几年发展起来的智能元启发式算法,已经被成功应用于多种优化问题中。针对CS算法在求解大数据、大规模复杂问题时,计算时间过长的问题,提出了一种基于统一计算设备架构(compute unified device arch... 布谷鸟搜索(cuckoo search,CS)算法是近几年发展起来的智能元启发式算法,已经被成功应用于多种优化问题中。针对CS算法在求解大数据、大规模复杂问题时,计算时间过长的问题,提出了一种基于统一计算设备架构(compute unified device architecture,CUDA)的并行布谷鸟搜索算法。该算法的并行实现采用任务并行与数据并行相结合的方式,利用图形处理器(graphic processing unit,GPU)线程块与线程分别映射布谷鸟个体与个体的每一维数据,并行实现CS算法中的鸟巢位置更新、个体适应度评估、鸟巢重建、寻找最优个体操作。整个CS算法的寻优迭代过程完全通过GPU实现,降低了算法计算过程中CPU与GPU的通信开销。对4个经典基准测试函数进行了仿真实验,结果表明,相比标准CS算法,基于CUDA架构的并行CS算法在求解收敛性一致的前提下,在求解速度上获得了高达110倍的计算加速比。 展开更多
关键词 布谷鸟搜索算法 并行计算 图形处理器(GPU) 统一计算设备架构(CUDA) GRAPHIC processing unit (GPU) compute UNIFIED device architecture (CUDA)
下载PDF
Implementation of the moving particle semi-implicit method on GPU 被引量:2
9
作者 ZHU XiaoSong CHENG Liang +1 位作者 LU Lin TENG Bin 《Science China(Physics,Mechanics & Astronomy)》 SCIE EI CAS 2011年第3期523-532,共10页
The Moving Particle Semi-implicit (MPS) method performs well in simulating violent free surface flow and hence becomes popular in the area of fluid flow simulation. However, the implementations of searching neighbouri... The Moving Particle Semi-implicit (MPS) method performs well in simulating violent free surface flow and hence becomes popular in the area of fluid flow simulation. However, the implementations of searching neighbouring particles and solving the large sparse matrix equations (Poisson-type equation) are very time-consuming. In order to utilize the tremendous power of parallel computation of Graphics Processing Units (GPU), this study has developed a GPU-based MPS model employing the Compute Unified Device Architecture (CUDA) on NVIDIA GTX 280. The efficient neighbourhood particle searching is done through an indirect method and the Poisson-type pressure equation is solved by the Bi-Conjugate Gradient (BiCG) method. Four different optimization levels for the present general parallel GPU-based MPS model are demonstrated. In addition, the elaborate optimization of GPU code is also discussed. A benchmark problem of dam-breaking flow is simulated using both codes of the present GPU-based MPS and the original CPU-based MPS. The comparisons between them show that the GPU-based MPS model outperforms 26 times the traditional CPU model. 展开更多
关键词 moving particle semi-implicit method (MPS) graphics processing units (GPU) compute unified device architecture (CUDA) neighbouring particle searching free surface flow
原文传递
A two-stage CO-PSO minimum structure inversion using CUDA for extracting IP information from MT data 被引量:1
10
作者 董莉 李帝铨 江沸菠 《Journal of Central South University》 SCIE EI CAS CSCD 2018年第5期1195-1212,共18页
The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear i... The study of induced polarization (IP) information extraction from magnetotelluric (MT) sounding data is of great and practical significance to the exploitation of deep mineral, oil and gas resources. The linear inversion method, which has been given priority in previous research on the IP information extraction method, has three main problems as follows: 1) dependency on the initial model, 2) easily falling into the local minimum, and 3) serious non-uniqueness of solutions. Taking the nonlinearity and nonconvexity of IP information extraction into consideration, a two-stage CO-PSO minimum structure inversion method using compute unified distributed architecture (CUDA) is proposed. On one hand, a novel Cauchy oscillation particle swarm optimization (CO-PSO) algorithm is applied to extract nonlinear IP information from MT sounding data, which is implemented as a parallel algorithm within CUDA computing architecture; on the other hand, the impact of the polarizability on the observation data is strengthened by introducing a second stage inversion process, and the regularization parameter is applied in the fitness function of PSO algorithm to solve the problem of multi-solution in inversion. The inversion simulation results of polarization layers in different strata of various geoelectric models show that the smooth models of resistivity and IP parameters can be obtained by the proposed algorithm, the results of which are relatively stable and accurate. The experiment results added with noise indicate that this method is robust to Gaussian white noise. Compared with the traditional PSO and GA algorithm, the proposed algorithm has more efficiency and better inversion results. 展开更多
关键词 Cauchy oscillation particle swarm optimization magnetotelluric sounding nonlinear inversion induced polarization (IP) information extraction compute unified distributed architecture (CUDA)
下载PDF
Graphic Processing Unit Based Phase Retrieval and CT Reconstruction for Differential X-Ray Phase Contrast Imaging
11
作者 陈晓庆 王宇杰 孙建奇 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第5期550-554,共5页
Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of ... Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of phase-contrast imaging, the grating-based phase contrast imaging has been widely accepted owing to the advantage of wide range of sample selections and exemption of coherent source. However, the downside is the substantially larger amount of data generated from the phase-stepping method which slows down the reconstruction process. Graphic processing unit(GPU) has the advantage of allowing parallel computing which is very useful for large quantity data processing. In this paper, a compute unified device architecture(CUDA) C program based on GPU is introduced to accelerate the phase retrieval and filtered back projection(FBP) algorithm for grating-based tomography. Depending on the size of the data, the CUDA C program shows different amount of speed-up over the standard C program on the same Visual Studio 2010 platform. Meanwhile, the speed-up ratio increases as the size of data increases. 展开更多
关键词 grating-based phase contrast imaging parallel computing graphic processing unit(GPU) compute unified device architecture(CUDA) filtered back projection(FBP)
原文传递
GPU based numerical simulation of core shooting process
12
作者 Yi-zhong Zhang Gao-chun Lu +3 位作者 Chang-jiang Ni Tao Jing Lin-long Yang Qin-fang Wu 《China Foundry》 SCIE 2017年第5期392-397,共6页
Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, r... Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, research on numerical simulation of the core shooting process is very limited. Based on a two-fluid model(TFM) and a kinetic-friction constitutive correlation, a program for 3D numerical simulation of the core shooting process has been developed and achieved good agreements with in-situ experiments. To match the needs of engineering applications, a graphics processing unit(GPU) has also been used to improve the calculation efficiency. The parallel algorithm based on the Compute Unified Device Architecture(CUDA) platform can significantly decrease computing time by multi-threaded GPU. In this work, the program accelerated by CUDA parallelization method was developed and the accuracy of the calculations was ensured by comparing with in-situ experimental results photographed by a high-speed camera. The design and optimization of the parallel algorithm were discussed. The simulation result of a sand core test-piece indicated the improvement of the calculation efficiency by GPU. The developed program has also been validated by in-situ experiments with a transparent core-box, a high-speed camera, and a pressure measuring system. The computing time of the parallel program was reduced by nearly 95% while the simulation result was still quite consistent with experimental data. The GPU parallelization method can successfully solve the problem of low computational efficiency of the 3D sand shooting simulation program, and thus the developed GPU program is appropriate for engineering applications. 展开更多
关键词 graphics processing unit (GPU) compute Unified Device architecture (CUDA) PARALLELIZATION core shooting process
下载PDF
基于GPU平台的有效字典压缩与解压缩技术
13
作者 覃子姗 顾璠 +1 位作者 秦晓科 陈铭松 《计算机科学与探索》 CSCD 2014年第5期525-536,共12页
压缩技术被广泛应用于数据存储和传输中,然而由于其内在的串行特性,大多数已有的基于字典的压缩与解压缩算法被设计在CPU上串行执行。为了探究使用图形处理器(graphic processing unit,GPU)对压缩与解压缩过程潜在性能的提升,结合合并... 压缩技术被广泛应用于数据存储和传输中,然而由于其内在的串行特性,大多数已有的基于字典的压缩与解压缩算法被设计在CPU上串行执行。为了探究使用图形处理器(graphic processing unit,GPU)对压缩与解压缩过程潜在性能的提升,结合合并内存访问与并行组装的技术,基于CUDA(compute unified device architecture)平台研究了两种并行压缩与解压缩方法:基于字典的无状态压缩和基于字典的LZW压缩。实验结果表明,与传统的单核实现比较,所提方法能够显著改善已有的基于字典的串行压缩与解压缩算法的性能。 展开更多
关键词 图形处理器(GPU) 统一计算设备架构(CUDA) 基于字典的压缩与解压缩 GRAPHIC processing unit (GPU) compute unified device architecture (CUDA)
下载PDF
适用于SIMD体系结构的多时钟耦合仿真技术
14
作者 何义 何圣 +2 位作者 彭向军 戴健 张春元 《软件》 2011年第9期45-48,共4页
随着媒体处理和科学计算等应用领域数据级并行性的需求不断增加,SIMD体系结构以其固有的易扩展数据并行处理结构被广泛采用且系统规模日益增大,这使得SIMD体系结构的仿真测试逐渐成为难题,仿真速度与成本的矛盾加剧。本文提出了一种适用... 随着媒体处理和科学计算等应用领域数据级并行性的需求不断增加,SIMD体系结构以其固有的易扩展数据并行处理结构被广泛采用且系统规模日益增大,这使得SIMD体系结构的仿真测试逐渐成为难题,仿真速度与成本的矛盾加剧。本文提出了一种适用于SIMD体系结构的多时钟耦合仿真技术,它采用多个不同频率的时钟分别控制仿真系统的不同功能模块,实现计算单元的分时复用。实验结果表明,多时钟耦合仿真技术能有效提高FPGA芯片的仿真能力,增强仿真系统的灵活可配置性,降低了硬件仿真的成本。 展开更多
关键词 计算机系统结构 SIMD FPGA 仿真
下载PDF
Hybrid domain multipactor prediction algorithm and its CUDA parallel implementation
15
作者 WU Peiyu XIE Yongjun +1 位作者 NIU Liqiang JIANG Haolin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第6期1097-1104,共8页
Based on the finite element method(FEM)in the frequency domain and particle-in-cell approach in the time domain,a hybrid domain multipactor threshold prediction algorithm is proposed in this paper.The proposed algorit... Based on the finite element method(FEM)in the frequency domain and particle-in-cell approach in the time domain,a hybrid domain multipactor threshold prediction algorithm is proposed in this paper.The proposed algorithm has the advantages of the frequency domain and the time domain algorithms at the same time in terms of high computational accuracy and considerable computational efficiency.In addition,the compute unified device architecture(CUDA)acceleration technique also can be employed to further enhance its simulation efficiency.Numerical examples are carried out to demonstrate the effectiveness of the proposed algorithm.The results indicate that the multipactor threshold can be accurately predicted and the computational efficiency can be improved. 展开更多
关键词 compute unified device architecture(CUDA) finite element method(FEM) hybrid domain multipactor threshold prediction particle-in-cell(PIC)
下载PDF
地震叠前时间偏移的一种图形处理器提速实现方法 被引量:73
16
作者 李博 刘国峰 刘洪 《地球物理学报》 SCIE EI CAS CSCD 北大核心 2009年第1期245-252,共8页
新近发展的图形处理器(GPU,Graphic Processing Unit)通用计算技术,现已日趋实用成型,并获得诸多应用领域的广泛关注.对油气勘探专项资料处理技术的运用而言,概因GPU与中央处理器(CPU)的计算性能的甚大差异,致使GPU这一通用计算技... 新近发展的图形处理器(GPU,Graphic Processing Unit)通用计算技术,现已日趋实用成型,并获得诸多应用领域的广泛关注.对油气勘探专项资料处理技术的运用而言,概因GPU与中央处理器(CPU)的计算性能的甚大差异,致使GPU这一通用计算技术在石油工业中的应用研究正在有效开展.本文仅借助于油气勘探中广泛使用的叠前时间偏移,旨在于扼要阐明其基于GPU应用的有效性;文中还提出一种利用GPU实现地震叠前时间偏移的软件构件方法,并针对非对称走时叠前时间偏移所拓展的应用软件提供一种具体实现架构.与以往用个人计算机(PC,Personal Computer)或者PC集群所用的叠前时间偏移相比,本文方法可甚大地提高计算效率,从而在石油物探资料处理中可显著地节约计算成本和维护费用.文中实际例证也表明,基于GPU进行高性能并行计算,当是适应目前石油工业中大规模计算需求的一个重要发展途径. 展开更多
关键词 非对称走时叠前时间偏移 图形处理器 GPU通用计算 统一计算设备架构
下载PDF
.NET框架下n层结构信息系统的设计与实现 被引量:14
17
作者 李波 王娓娓 何建敏 《计算机与现代化》 2005年第1期60-62,共3页
信息系统计算模式是信息系统体系结构研究中的重要问题,本文探讨了面向Internet进行计算的n层分布式模型,以及.NET框架对n层系统设计与实现的影响。
关键词 信息系统 信息系统计算模式 信息系统体系结构
下载PDF
基于GPU的电力系统并行潮流计算的实现 被引量:34
18
作者 夏俊峰 杨帆 +1 位作者 李静 郑秀玉 《电力系统保护与控制》 EI CSCD 北大核心 2010年第18期100-103,110,共5页
在研究GPU通用计算方法和潮流计算算法的基础上,针对GPU计算密集、高度并行化等特点,对潮流计算牛顿法进行了适当的简化,并应用统一计算设备架构(Compute Unified Device Architecture,CUDA)的开发平台,提出了一种基于GPU的并行潮流计... 在研究GPU通用计算方法和潮流计算算法的基础上,针对GPU计算密集、高度并行化等特点,对潮流计算牛顿法进行了适当的简化,并应用统一计算设备架构(Compute Unified Device Architecture,CUDA)的开发平台,提出了一种基于GPU的并行潮流计算程序的设计方法。仿真计算结果表明此算法可行,并具有较高的计算效率,为电力系统并行潮流计算的研究提供了一种可行的方法。 展开更多
关键词 潮流计算 并行计算 GPU通用计算 统一计算设备架构 牛顿法
下载PDF
基于CUDA的高分辨率数字视频图像配准快速实现 被引量:27
19
作者 闫钧华 杭谊青 +1 位作者 许俊峰 储林臻 《仪器仪表学报》 EI CAS CSCD 北大核心 2014年第2期380-386,共7页
高分辨率数字视频图像数据量巨大,基于SIFT图像配准算法在CPU上实现时用时巨大。针对此,首先对配准算法中3个最耗时的部分:SIFT特征提取;SIFT特征匹配;RANSAC算法提纯匹配点对,求解变换模型参数。对此展开重点研究,研究其并行算法。然... 高分辨率数字视频图像数据量巨大,基于SIFT图像配准算法在CPU上实现时用时巨大。针对此,首先对配准算法中3个最耗时的部分:SIFT特征提取;SIFT特征匹配;RANSAC算法提纯匹配点对,求解变换模型参数。对此展开重点研究,研究其并行算法。然后基于CUDA并行快速实现高分辨率数字视频图像配准。实验结果表明:基于SIFT图像配准算法在CPU与CUDA上实现,在配准效果相近时,在CUDA上实现的处理速度比在CPU上实现的处理速度提高了100多倍,并且随着图像像素数的增加加速比有显著提高。 展开更多
关键词 图像配准 高分辨率 数字视频 CUDA
下载PDF
基于GPU的并行优化技术 被引量:23
20
作者 左颢睿 张启衡 +1 位作者 徐勇 赵汝进 《计算机应用研究》 CSCD 北大核心 2009年第11期4115-4118,共4页
针对标准并行算法难以在图形处理器(GPU)上高效运行的问题,以累加和算法为例,基于Nvidia公司统一计算设备架构(CUDA)GPU介绍了指令优化、共享缓存冲突避免、解循环优化和线程过载优化四种优化方法。实验结果表明,并行优化能有效提高算法... 针对标准并行算法难以在图形处理器(GPU)上高效运行的问题,以累加和算法为例,基于Nvidia公司统一计算设备架构(CUDA)GPU介绍了指令优化、共享缓存冲突避免、解循环优化和线程过载优化四种优化方法。实验结果表明,并行优化能有效提高算法在GPU上的执行效率,优化后累加和算法的运算速度相比标准并行算法提高了约34倍,相比CPU串行实现提高了约70倍。 展开更多
关键词 图形处理器 并行优化 累加和 统一计算设备架构
下载PDF
上一页 1 2 72 下一页 到第
使用帮助 返回顶部