期刊文献+
共找到510篇文章
< 1 2 26 >
每页显示 20 50 100
Heterogeneous parallel computing accelerated iterative subpixel digital image correlation 被引量:9
1
作者 HUANG JianWen ZHANG LingQi +6 位作者 JIANG ZhenYu DONG ShouBin CHEN Wei LIU YiPing LIU ZeJia ZHOU LiCheng TANG LiQun 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2018年第1期74-85,共12页
Parallel computing techniques have been introduced into digital image correlation(DIC) in recent years and leads to a surge in computation speed. The graphics processing unit(GPU)-based parallel computing demonstrated... Parallel computing techniques have been introduced into digital image correlation(DIC) in recent years and leads to a surge in computation speed. The graphics processing unit(GPU)-based parallel computing demonstrated a surprising effect on accelerating the iterative subpixel DIC, compared with CPU-based parallel computing. In this paper, the performances of the two kinds of parallel computing techniques are compared for the previously proposed path-independent DIC method, in which the initial guess for the inverse compositional Gauss-Newton(IC-GN) algorithm at each point of interest(POI) is estimated through the fast Fourier transform-based cross-correlation(FFT-CC) algorithm. Based on the performance evaluation, a heterogeneous parallel computing(HPC) model is proposed with hybrid mode of parallelisms in order to combine the computing power of GPU and multicore CPU. A scheme of trial computation test is developed to optimize the configuration of the HPC model on a specific computer. The proposed HPC model shows excellent performance on a middle-end desktop computer for real-time subpixel DIC with high resolution of more than 10000 POIs per frame. 展开更多
关键词 digital image correlation(DIC) inverse compositional Gauss-Newton(IC-GN) algorithm heterogeneous parallel computing graphics processing unit(gpu) multicore CPU real-time DIC
原文传递
Volumetric lattice Boltzmann method for pore-scale mass diffusionadvection process in geopolymer porous structures 被引量:1
2
作者 Xiaoyu Zhang Zirui Mao +6 位作者 Floyd W.Hilty Yulan Li Agnes Grandjean Robert Montgomery Hans-Conrad zur Loye Huidan Yu Shenyang Hu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2126-2136,共11页
Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advecti... Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advection process within porous structures is essential for material design.In this study,we present advancements in the volumetric lattice Boltzmann method(VLBM)for modeling and simulating pore-scale diffusion-advection of radioactive isotopes within geopolymer porous structures.These structures are created using the phase field method(PFM)to precisely control pore architectures.In our VLBM approach,we introduce a concentration field of an isotope seamlessly coupled with the velocity field and solve it by the time evolution of its particle population function.To address the computational intensity inherent in the coupled lattice Boltzmann equations for velocity and concentration fields,we implement graphics processing unit(GPU)parallelization.Validation of the developed model involves examining the flow and diffusion fields in porous structures.Remarkably,good agreement is observed for both the velocity field from VLBM and multiphysics object-oriented simulation environment(MOOSE),and the concentration field from VLBM and the finite difference method(FDM).Furthermore,we investigate the effects of background flow,species diffusivity,and porosity on the diffusion-advection behavior by varying the background flow velocity,diffusion coefficient,and pore volume fraction,respectively.Notably,all three parameters exert an influence on the diffusion-advection process.Increased background flow and diffusivity markedly accelerate the process due to increased advection intensity and enhanced diffusion capability,respectively.Conversely,increasing the porosity has a less significant effect,causing a slight slowdown of the diffusion-advection process due to the expanded pore volume.This comprehensive parametric study provides valuable insights into the kinetics of isotope uptake in porous structures,facilitating the de 展开更多
关键词 Volumetric lattice Boltzmann method(VLBM) Phase field method(PFM) Pore-scale diffusion-advection Nuclear waste treatment Porous media flow Graphics processing unit(gpu) parallelization
下载PDF
Study on the particle breakage of ballast based on a GPU accelerated discrete element method 被引量:4
3
作者 Guang-Yu Liu Wen-Jie Xu +1 位作者 Qi-Cheng Sun Nicolin Govender 《Geoscience Frontiers》 SCIE CAS CSCD 2020年第2期461-471,共11页
Breakage of particles will have greatly influence on mechanical behavior of granular material(GM)under external loads,such as ballast,rockfill and sand.The discrete element method(DEM)is one of the most popular method... Breakage of particles will have greatly influence on mechanical behavior of granular material(GM)under external loads,such as ballast,rockfill and sand.The discrete element method(DEM)is one of the most popular methods for simulating GM as each particle is represented on its own.To study breakage mechanism of particle breakage,a cohesive contact mode is developed based on the GPU accelerated DEM code-Blaze-DEM.A database of the 3D geometry model of rock blocks is established based on the 3D scanning method.And an agglomerate describing the rock block with a series of non-overlapping spherical particles is used to build the DEM numerical model of a railway ballast sample,which is used to the DEM oedometric test to study the particles’breakage characteristics of the sample under external load.Furthermore,to obtain the meso-mechanical parameters used in DEM,a black-analysis method is used based on the laboratory tests of the rock sample.Based on the DEM numerical tests,the particle breakage process and mechanisms of the railway ballast are studied.All results show that the developed code can better used for large scale simulation of the particle breakage analysis of granular material. 展开更多
关键词 Discrete element method(DEM) Particle breakage Graphical processing unit(gpu) Railway ballast Granular material(GM)
下载PDF
Design and implementation of a multi-tile parallel scanning rasterization accelerator
4
作者 Xing Lidong Guo Qiang +1 位作者 Peng Xinlong Feng Zhenfu 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2024年第2期94-104,共11页
In the design of a graphic processing unit(GPU),the processing speed of triangle rasterization is an important factor that determines the performance of the GPU.An architecture of a multi-tile parallel-scan rasterizat... In the design of a graphic processing unit(GPU),the processing speed of triangle rasterization is an important factor that determines the performance of the GPU.An architecture of a multi-tile parallel-scan rasterization accelerator was proposed in this paper.The accelerator uses a bounding box algorithm to improve scanning efficiency.It rasterizes multiple tiles in parallel and scans multiple lines at the same time within each tile.This highly parallel approach drastically improves the performance of rasterization.Using the 65 nm process standard cell library of Semiconductor Manufacturing International Corporation(SMIC),the accelerator can be synthesized to a maximum clock frequency of 220 MHz.An implementation on the Genesys2 field programmable gate array(FPGA)board fully verifies the functionality of the accelerator.The implementation shows a significant improvement in rendering speed and efficiency and proves its suitability for high-performance rasterization. 展开更多
关键词 graphic processing unit(gpu) RASTERIZATION multi-tile PARALLELISM
原文传递
Efficient Knowledge Graph Embedding Training Framework with Multiple GPUs 被引量:1
5
作者 Ding Sun Zhen Huang +1 位作者 Dongsheng Li Min Guo 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第1期167-175,共9页
When training a large-scale knowledge graph embedding(KGE)model with multiple graphics processing units(GPUs),the partition-based method is necessary for parallel training.However,existing partition-based training met... When training a large-scale knowledge graph embedding(KGE)model with multiple graphics processing units(GPUs),the partition-based method is necessary for parallel training.However,existing partition-based training methods suffer from low GPU utilization and high input/output(IO)overhead between the memory and disk.For a high IO overhead between the disk and memory problem,we optimized the twice partitioning with fine-grained GPU scheduling to reduce the IO overhead between the CPU memory and disk.For low GPU utilization caused by the GPU load imbalance problem,we proposed balanced partitioning and dynamic scheduling methods to accelerate the training speed in different cases.With the above methods,we proposed fine-grained partitioning KGE,an efficient KGE training framework with multiple GPUs.We conducted experiments on some benchmarks of the knowledge graph,and the results show that our method achieves speedup compared to existing framework on the training of KGE. 展开更多
关键词 knowledge graph embedding parallel algorithm partitioning graph framework graphics processing unit(gpu)
原文传递
GPU-ACCELERATED FEM SOLVER FOR THREE DIMENSIONAL ELECTROMAGNETIC ANALYSIS 被引量:2
6
作者 Tian Jin Gong Li +1 位作者 Shi Xiaowei Le Xu 《Journal of Electronics(China)》 2011年第4期615-622,共8页
A new Graphics Processing Unit(GPU) parallelization strategy is proposed to accelerate sparse finite element computation for three dimensional electromagnetic analysis.The parallelization strategy is employed based on... A new Graphics Processing Unit(GPU) parallelization strategy is proposed to accelerate sparse finite element computation for three dimensional electromagnetic analysis.The parallelization strategy is employed based on a new compression format called sliced ELL Four(sliced ELL-F).The sliced ELL-F format-based parallelization strategy is designed for hastening many addition,dot product,and Sparse Matrix Vector Product(SMVP) operations in the Conjugate Gradient Norm(CGN) calculation of finite element equations.The new implementation of SMVP on GPUs is evaluated.The proposed strategy executed on a GPU can efficiently solve sparse finite element equations,espe-cially when the equations are huge sparse(size of most rows in a coefficient matrix is less than 8).Numerical results show the sliced ELL-F format-based parallelization strategy can reach signi?cant speedups compared to Compressed Sparse Row(CSR) format. 展开更多
关键词 Finite Element Method(FEM) Graphics Processing unit(gpu) Parallelization strategy Conjugate Gradient Norm(CGN) Sliced ELL Four(sliced ELL-F)
下载PDF
Complex hexagonal close-packed dendritic growth during alloy solidification by graphics processing unit-accelerated three-dimensional phase-field simulations:demo for Mg–Gd alloy
7
作者 Sheng-Lan Yang Jing Zhong +5 位作者 Kai Wang Xun Kang Jian-Bao Gao Jiong Wang Qian Li Li-Jun Zhang 《Rare Metals》 SCIE EI CAS CSCD 2023年第10期3468-3484,共17页
In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dim... In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dimensional(3D) phase-field simulations,as demonstrated for a Mg-Gd alloy.An anisotropic phasefield model with finite interface dissipation was developed by incorporating the contribution of the anisotropy of interfacial energy into the total free energy functional.The modified spherical harmonic anisotropy function was then chosen for the hcp crystal.The GPU parallel computing algorithm was implemented in the present phase-field model,and a corresponding code was developed in the compute unified device architecture parallel computing platform.Benchmark tests indicated that the calculation efficiency of a single TESLA V100 GPU could be~80times that of open multi-processing(OpenMP) with eight central processing unit cores.By coupling the phase-field model with reliable thermodynamic and interfacial energy descriptions,the 3D phase-field simulation of α-Mg dendritic growth in the Mg-6Gd(in wt%) alloy during solidification was performed.Various two-dimensional dendrite morphologies were revealed by cutting the simulated 3D dendrite along different crystallographic planes.Typical sixfold equiaxed and butterflied microstructures observed in experiments were well reproduced. 展开更多
关键词 Interfacial anisotropy Dendrite solidification Phase-field model Graphics processing unit(gpu) Mg–Gd
原文传递
GPU-accelerated vector-form particle-element method for 3D elastoplastic contact of structures
8
作者 Wei WANG Yanfeng ZHENG +2 位作者 Jingzhe TANG Chao YANG Yaozhi LUO 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2023年第12期1120-1130,共11页
A graphics processing unit(GPU)-accelerated vector-form particle-element method,i.e.,the finite particle method(FPM),is proposed for 3D elastoplastic contact of structures involving strong nonlinearities and computati... A graphics processing unit(GPU)-accelerated vector-form particle-element method,i.e.,the finite particle method(FPM),is proposed for 3D elastoplastic contact of structures involving strong nonlinearities and computationally expensive contact calculations.A hexahedral FPM element with reduced integration and anti-hourglass is developed to model structural elastoplastic behaviors.The 3D space containing contact surfaces is decomposed into cubic cells and the contact search is performed between adjacent cells to improve search efficiency.A connected list data structure is used for storing contact particles to facilitate the parallel contact search procedure.The contact constraints are enforced by explicitly applying normal and tangential contact forces to the contact particles.The proposed method is fully accelerated by GPU-based parallel computing.After verification,the performance of the proposed method is compared with the serial finite element code Abaqus/Explicit by testing two large-scale contact examples.The maximum speedup of the proposed method over Abaqus/Explicit is approximately 80 for the overall computation and 340 for contact calculations.Therefore,the proposed method is shown to be effective and efficient. 展开更多
关键词 Graphics processing unit(gpu) Parallel acceleration Elastoplastic contact Contact search Finite particle method(FPM)
原文传递
A GPU accelerated finite volume coastal ocean model 被引量:1
9
作者 赵旭东 梁书秀 +3 位作者 孙昭晨 赵西增 孙家文 刘忠波 《Journal of Hydrodynamics》 SCIE EI CSCD 2017年第4期679-690,共12页
With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor U... With the unstructured grid, the Finite Volume Coastal Ocean Model(FVCOM) is converted from its original FORTRAN code to a Compute Unified Device Architecture(CUDA) C code, and optimized on the Graphic Processor Unit(GPU). The proposed GPU-FVCOM is tested against analytical solutions for two standard cases in a rectangular basin, a tide induced flow and a wind induced circulation. It is then applied to the Ningbo's coastal water area to simulate the tidal motion and analyze the flow field and the vertical tide velocity structure. The simulation results agree with the measured data quite well. The accelerated performance of the proposed 3-D model reaches 30 times of that of a single thread program, and the GPU-FVCOM implemented on a Tesla k20 device is faster than on a workstation with 20 CPU cores, which shows that the GPU-FVCOM is efficient for solving large scale sea area and high resolution engineering problems. 展开更多
关键词 Graphic Processor unitgpu 3-D ocean model unstructured grid finite volume coastal ocean model(FVCOM)
原文传递
Novel Geometrical Voxelization Approach with Application to Streamlines 被引量:1
10
作者 谢咸熙 张勤振 +1 位作者 戴文凯 沈汉威 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第5期895-904,共10页
This paper presents a novel geometrical voxelization algorithm for polygonal models.First,distance computation is performed slice by slice on graphics processing units(GPUs) between geometrical primitives and voxels... This paper presents a novel geometrical voxelization algorithm for polygonal models.First,distance computation is performed slice by slice on graphics processing units(GPUs) between geometrical primitives and voxels for line/surface voxelization.A novel solid filling process is then proposed to assist surface voxelization and achieve solid voxelization. Furthermore,using the proposed transfer functions,both binary and anti-aliasing voxelizations are achievable. Finally,the proposed approach can be applied to voxelize streamlines for 3D vector fields using line voxelization.The proposed approach obtains desired experimental results. 展开更多
关键词 VOXELIZATION distance field graphics processing unitgpu VISUALIZATION
原文传递
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
11
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Simulating coupled dynamics of a rigid-flexible multibody system and compressible fluid 被引量:3
12
作者 Wei Hu Qiang Tian HaiYan Hu 《Science China(Physics,Mechanics & Astronomy)》 SCIE EI CAS CSCD 2018年第4期54-68,共15页
As a subsequent work of previous studies of authors, a new parallel computation approach is proposed to simulate the coupled dynamics of a rigid-flexible multibody system and compressible fluid. In this approach, the ... As a subsequent work of previous studies of authors, a new parallel computation approach is proposed to simulate the coupled dynamics of a rigid-flexible multibody system and compressible fluid. In this approach, the smoothed particle hydrodynamics(SPH) method is used to model the compressible fluid, the natural coordinate formulation(NCF) and absolute nodal coordinate formulation(ANCF) are used to model the rigid and flexible bodies, respectively. In order to model the compressible fluid properly and efficiently via SPH method, three measures are taken as follows. The first is to use the Riemann solver to cope with the fluid compressibility, the second is to define virtual particles of SPH to model the dynamic interaction between the fluid and the multibody system, and the third is to impose the boundary conditions of periodical inflow and outflow to reduce the number of SPH particles involved in the computation process. Afterwards, a parallel computation strategy is proposed based on the graphics processing unit(GPU) to detect the neighboring SPH particles and to solve the dynamic equations of SPH particles in order to improve the computation efficiency. Meanwhile, the generalized-alpha algorithm is used to solve the dynamic equations of the multibody system. Finally, four case studies are given to validate the proposed parallel computation approach. 展开更多
关键词 smoothed particle hydrodynamics(SPH) compressible flow Riemann solver absolute nodal coordinate formulation(ANCF) graphics processing unitgpu
原文传递
GPU-accelerated phase field simulation of directional solidification 被引量:1
13
作者 GAO Ang HU YanSu +3 位作者 WANG ZhiJun MU DeJun LI JunJie WANG JinCheng 《Science China(Technological Sciences)》 SCIE EI CAS 2014年第6期1191-1197,共7页
The phase field simulation has been actively studied as a powerful method to investigate the microstructural evolution during the solidification.However,it is a great challenge to perform the phase field simulation in... The phase field simulation has been actively studied as a powerful method to investigate the microstructural evolution during the solidification.However,it is a great challenge to perform the phase field simulation in large length and time scale.The developed graphics processing unit(GPU)calculation is used in the phase filed simulation,greatly accelerating the calculation efficiency.The results show that the computation with GPU is about 36 times faster than that with a single Central Processing Unit(CPU)core.It provides the feasibility of the GPU-accelerated phase field simulation on a desktop computer.The GPU-accelerated strategy will bring a new opportunity to the application of phase field simulation. 展开更多
关键词 phase field simulation directional solidification graphics processing unitgpu acceleration computer unified device architecture(CUDA) speed-up ratio
原文传递
Graphic Processing Unit Based Phase Retrieval and CT Reconstruction for Differential X-Ray Phase Contrast Imaging
14
作者 陈晓庆 王宇杰 孙建奇 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第5期550-554,共5页
Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of ... Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of phase-contrast imaging, the grating-based phase contrast imaging has been widely accepted owing to the advantage of wide range of sample selections and exemption of coherent source. However, the downside is the substantially larger amount of data generated from the phase-stepping method which slows down the reconstruction process. Graphic processing unit(GPU) has the advantage of allowing parallel computing which is very useful for large quantity data processing. In this paper, a compute unified device architecture(CUDA) C program based on GPU is introduced to accelerate the phase retrieval and filtered back projection(FBP) algorithm for grating-based tomography. Depending on the size of the data, the CUDA C program shows different amount of speed-up over the standard C program on the same Visual Studio 2010 platform. Meanwhile, the speed-up ratio increases as the size of data increases. 展开更多
关键词 grating-based phase contrast imaging parallel computing graphic processing unit(gpu) compute unified device architecture(CUDA) filtered back projection(FBP)
原文传递
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
15
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(gpu) compute unified device architecture (CUDA) neural network species recognition
下载PDF
A GPU-Based Parallel Algorithm for 2D Large Deformation Contact Problems Using the Finite Particle Method 被引量:1
16
作者 Wei Wang Yanfeng Zheng +2 位作者 Jingzhe Tang Chao Yang Yaozhi Luo 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第11期595-626,共32页
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr... Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method. 展开更多
关键词 Finite particle method graphics processing unit(gpu) parallel computing contact algorithm LARGE
下载PDF
BADF:Bounding Volume Hierarchies Centric Adaptive Distance Field Computation for Deformable Objects on GPUs
17
作者 Xiao-Rui Chen Min Tang +2 位作者 Cheng Li Dinesh Manocha Ruo-Feng Tong 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第3期731-740,共10页
We present a novel algorithm BADF(Bounding Volume Hierarchy Based Adaptive Distance Fields)for accelerating the construction of ADFs(adaptive distance fields)of rigid and deformable models on graphics processing units... We present a novel algorithm BADF(Bounding Volume Hierarchy Based Adaptive Distance Fields)for accelerating the construction of ADFs(adaptive distance fields)of rigid and deformable models on graphics processing units.Our approach is based on constructing a bounding volume hierarchy(BVH)and we use that hierarchy to generate an octree-based ADF.We exploit the coherence between successive frames and sort the grid points of the octree to accelerate the computation.Our approach is applicable to rigid and deformable models.Our GPU-based(graphics processing unit based)algorithm is about 20x--50x faster than current mainstream central processing unit based algorithms.Our BADF algorithm can construct the distance fields for deformable models with 60k triangles at interactive rates on an NVIDIA GTX GeForce 1060.Moreover,we observe 3x speedup over prior GPU-based ADF algorithms. 展开更多
关键词 distance field deformable object graphics processing unit(gpu) OCTREE bounding volume hierarchy
原文传递
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm 被引量:1
18
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation Graphical processing unit(gpu)
下载PDF
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
19
作者 LI Bingchao WEI Jizeng +1 位作者 GUO Wei SUN Jizhou 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(gpu) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
MoM-PO/SBR Algorithm Based on Collaborative Platform and Mixed Model
20
作者 TANG Xiaobin FENG Yuan GONG Xiaoyan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2019年第4期589-598,共10页
For electromagnetic scattering of 3?D complex electrically large conducting targets,a new hybrid algorithm,MoM?PO/SBR algorithm,is presented to realize the interaction of information between method of moment(MoM)and p... For electromagnetic scattering of 3?D complex electrically large conducting targets,a new hybrid algorithm,MoM?PO/SBR algorithm,is presented to realize the interaction of information between method of moment(MoM)and physical optics(PO)/shooting and bouncing ray(SBR).In the algorithm,the COC file that based on the Huygens equivalent principle is introduced,and the conversion interface between the equivalent surface and the target is established.And then,the multi?task flow model presented in this paper is adopted to conduct CPU/graphics processing unit(GPU)tests of the algorithm under three modes,i.e.,MPI/OpenMP,MPI/compute unified device architecture(CUDA)and multi?task programming model(MTPM).Numerical results are presented and compared with reference solutions in order to illustrate the accuracy and the efficiency of the proposed algorithm. 展开更多
关键词 graphics processing unit(gpu) multi⁃task programming model(MTPM) physical optics(PO) method of moment(MoM)
下载PDF
上一页 1 2 26 下一页 到第
使用帮助 返回顶部