三维锥束CT图像的FDK算法重建由于运算量大,在重建高分辨率的图像时,重建所需时间通常难以满足实际应用的需求,集群并行计算是解决上述问题的常用方法之一。在一个SM P集群系统上,采用M P I和P threads两种模型相结合的方法,通过节点之...三维锥束CT图像的FDK算法重建由于运算量大,在重建高分辨率的图像时,重建所需时间通常难以满足实际应用的需求,集群并行计算是解决上述问题的常用方法之一。在一个SM P集群系统上,采用M P I和P threads两种模型相结合的方法,通过节点之间的消息传递和节点内部的共享内存,实现了FDK算法的两级并行。展开更多
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t...This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.展开更多
为了提高公安机关查找犯罪车辆的效率,提高车辆识别的效率很必要。据统计,提取兴趣区域(Region Of Interest,ROI)约占车型识别过程的60%,因此如何加速提取ROI过程尤其重要。首先,通过数据划分方法实现基本并行算法;然后,经过实验分析,...为了提高公安机关查找犯罪车辆的效率,提高车辆识别的效率很必要。据统计,提取兴趣区域(Region Of Interest,ROI)约占车型识别过程的60%,因此如何加速提取ROI过程尤其重要。首先,通过数据划分方法实现基本并行算法;然后,经过实验分析,在基本并行算法的基础上,精心设计预处理过程的分解方案,设置多队列缓冲区,减少共用缓冲区的线程数量和每个缓冲区互斥锁锁定的次数。实验证明,所提算法在双CPU 12核(支持超线程到24线程)的服务器上运行,相对于串行算法,实现了13.1x的加速比。展开更多
To effectively solve the single-source shortest path(SSSP)problem for massive road networks in geographical information systems,a new synchronization method is proposed in the implementations of parallel SSSP algorith...To effectively solve the single-source shortest path(SSSP)problem for massive road networks in geographical information systems,a new synchronization method is proposed in the implementations of parallel SSSP algorithm.It applies spinlock by inline assembly language for the sake of small overheads of controlling the interaction of multiple threads.The performance of our method is compared with widely used Pthreads application programming interfaces and the powerful sequential solution given by DIMACS.The experimental platform is a shared address space workstation with two processors(i.e.eight cores)at a clock speed of 3 GHz.Problem instances for experiments contain a directed road networks of the USA with more than 23 million vertices and 57 million edges,and its 11 subnetworks of variant sizes.This method answers the SSSP of the USA road network in 1231 ms,while Pthreads costs 1808 ms and DIMACS sequential solution takes 4856 ms.It achieves a speedup of 3.95,which is 47%faster than Pthreads with the speedup of 2.69.When the size of instance is larger,our method achieves a better performance.展开更多
In this paper,a highly parallel batch processing engine is designed for SPARQL queries.Machine learning algorithms were applied to make time predictions of queries and reasonably group them,and further make reasonable...In this paper,a highly parallel batch processing engine is designed for SPARQL queries.Machine learning algorithms were applied to make time predictions of queries and reasonably group them,and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries.Finally,the query is processed in parallel by introducing pthreads.Based on the above three points,a spall time prediction algorithm was proposed,including data processing,to better deal with batch SPARQL queries,and the introduction of pthread can make our query processing faster.Since data processing was added to query time prediction,the method can be implemented in any set of data-queries.Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.展开更多
文摘三维锥束CT图像的FDK算法重建由于运算量大,在重建高分辨率的图像时,重建所需时间通常难以满足实际应用的需求,集群并行计算是解决上述问题的常用方法之一。在一个SM P集群系统上,采用M P I和P threads两种模型相结合的方法,通过节点之间的消息传递和节点内部的共享内存,实现了FDK算法的两级并行。
文摘This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.
文摘为了提高公安机关查找犯罪车辆的效率,提高车辆识别的效率很必要。据统计,提取兴趣区域(Region Of Interest,ROI)约占车型识别过程的60%,因此如何加速提取ROI过程尤其重要。首先,通过数据划分方法实现基本并行算法;然后,经过实验分析,在基本并行算法的基础上,精心设计预处理过程的分解方案,设置多队列缓冲区,减少共用缓冲区的线程数量和每个缓冲区互斥锁锁定的次数。实验证明,所提算法在双CPU 12核(支持超线程到24线程)的服务器上运行,相对于串行算法,实现了13.1x的加速比。
文摘To effectively solve the single-source shortest path(SSSP)problem for massive road networks in geographical information systems,a new synchronization method is proposed in the implementations of parallel SSSP algorithm.It applies spinlock by inline assembly language for the sake of small overheads of controlling the interaction of multiple threads.The performance of our method is compared with widely used Pthreads application programming interfaces and the powerful sequential solution given by DIMACS.The experimental platform is a shared address space workstation with two processors(i.e.eight cores)at a clock speed of 3 GHz.Problem instances for experiments contain a directed road networks of the USA with more than 23 million vertices and 57 million edges,and its 11 subnetworks of variant sizes.This method answers the SSSP of the USA road network in 1231 ms,while Pthreads costs 1808 ms and DIMACS sequential solution takes 4856 ms.It achieves a speedup of 3.95,which is 47%faster than Pthreads with the speedup of 2.69.When the size of instance is larger,our method achieves a better performance.
文摘In this paper,a highly parallel batch processing engine is designed for SPARQL queries.Machine learning algorithms were applied to make time predictions of queries and reasonably group them,and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries.Finally,the query is processed in parallel by introducing pthreads.Based on the above three points,a spall time prediction algorithm was proposed,including data processing,to better deal with batch SPARQL queries,and the introduction of pthread can make our query processing faster.Since data processing was added to query time prediction,the method can be implemented in any set of data-queries.Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.