Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a s...Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.展开更多
Bundle adjustment (BA) is a crucial but time consuming step in 3D reconstruction. In this paper, we intend to tackle a special class of BA problems where the reconstructed 3D points are much more numerous than the c...Bundle adjustment (BA) is a crucial but time consuming step in 3D reconstruction. In this paper, we intend to tackle a special class of BA problems where the reconstructed 3D points are much more numerous than the camera parameters, called Massive-Points BA (MPBA) problems. This is often the case when high-resolution images are used. We present a design and implementation of a new bundle adjustment algorithm for efficiently solving the MPBA problems. The use of hardware parallelism, the multi-core CPUs as well as GPUs, is explored. By careful memory-usage design, the graphic-memory limitation is effectively alleviated. Several modern acceleration strategies for bundle adjustment, such as the mixed-precision arithmetics, the embedded point iteration, and the preconditioned conjugate gradients, are explored and compared. By using several high-resolution image datasets, we generate a variety of MFBA problems, with which the performance of five bundle adjustment algorithms are evaluated. The experimental results show that our algorithm is up to 40 times faster than classical Sparse Bundle Adjustment, while maintaining comparable precision.展开更多
基金supported by the National Natural Science Foundation of China (20221603 and 20906091)
文摘Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.
基金supported by the National Natural Science Foundation of China under Grant No.60835003the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDA06030300
文摘Bundle adjustment (BA) is a crucial but time consuming step in 3D reconstruction. In this paper, we intend to tackle a special class of BA problems where the reconstructed 3D points are much more numerous than the camera parameters, called Massive-Points BA (MPBA) problems. This is often the case when high-resolution images are used. We present a design and implementation of a new bundle adjustment algorithm for efficiently solving the MPBA problems. The use of hardware parallelism, the multi-core CPUs as well as GPUs, is explored. By careful memory-usage design, the graphic-memory limitation is effectively alleviated. Several modern acceleration strategies for bundle adjustment, such as the mixed-precision arithmetics, the embedded point iteration, and the preconditioned conjugate gradients, are explored and compared. By using several high-resolution image datasets, we generate a variety of MFBA problems, with which the performance of five bundle adjustment algorithms are evaluated. The experimental results show that our algorithm is up to 40 times faster than classical Sparse Bundle Adjustment, while maintaining comparable precision.