期刊文献+

基于CUDA的k-means文档聚类算法并行优化 被引量:2

Parallel optimization algorithm for k-means document clustering based on CUDA
下载PDF
导出
摘要 为提高k-means的大数据量计算速度,结合k-means算法计算密集和计算统一设备架构(CUDA)的特点,提出了寄存器优化的并行聚类算法和滑动门并行计算中心点算法。寄存器优化的并行聚类算法优化了聚类步骤,提高了GPU的寄存器利用率,降低了数据获取延迟;滑动门并行计算中心点算法优化了中心点计算步骤,避免了数据同步,提高了GPU计算核心的利用率。实验结果表明,并行优化的k-means算法在GTX 480上可获最高约137倍的加速比,有效地提高了k-means算法在单机上的运行效率。 To enhance the computation speed of k-means document clustering combining computationally intensive feature, register optimized parallel algorithm for clustering process and sliding doors parallel algorithm for computing center point process are proposed based on compute unified device architecture (CUDA). Register optimized parallel algorithm for clustering process improves utilization rate and reduces data acquisition delay of GPU; Sliding doors parallel algorithm for computing center point process utilizes GPU core much more efficiently while avoiding data synchronization. Experimental results show that the proposed parallel optimization algorithm gets the speed up ratio of more than 137 times and improves the operation efficiency of the k-means algorithm running in the stand-alone environment.
出处 《计算机工程与设计》 CSCD 北大核心 2013年第11期4032-4036,4071,共6页 Computer Engineering and Design
基金 国家自然科学基金项目(61271280 61001100) "十二五"国家科技支撑计划课题基金项目(2011BAD21B05)
关键词 K-MEANS 文档聚类 CUDA 并行计算 GPU k-means document clustering CUDA parallel computation GPU
  • 相关文献

参考文献10

  • 1尹建君,王乐.数据划分优化的并行k-means算法[J].计算机工程与应用,2010,46(15):127-131. 被引量:7
  • 2毛典辉.基于MapReduce的Canopy-Kmeans改进算法[J].计算机工程与应用,2012,48(27):22-26. 被引量:66
  • 3Reza R, Daniel R, Ellick C, et al. A parallel implementation of k-means clustering on GPUs [C] //Proc of International Conference on Parallel and Distributed Processing Techniques and Applications. Springer-Verlag, 2008: 340-345. 被引量:1
  • 4Mario Z, Michael G. Accelerating K-means on the graphics processor via CUDA [C] //Proc of the 1st International Con ference on Intensive Applications and Services. [S. l. ]: IEEE Press, 2009: 7-15. 被引量:1
  • 5Bai Hongtao, He Lili, Ouyang Dantong, et al. K-means on commodity GPUs with CUDA [C]//Proc of WRI World Congress on Computer Science and Information Engineering. ACM Press, 2009: 651-655. 被引量:1
  • 6兰远东,刘宇芳,徐涛.分批处理的K-means算法并行实现[J].计算机工程,2012,38(13):145-147. 被引量:2
  • 7NVIDIA. CUDA programming guide 2.0 [M]. Santa Clara: NVIDIA Corporation, 2008. 被引量:1
  • 8David B Kirk, Wen-mei W Hwu. Programming massively parallel Processors: A hands-on approach [M]. ELSEVIER, 2010. 被引量:1
  • 9Han Jiawei, Kamber M. Data mining: Concepts and techniques [M]. Morgan Kaufmann, 2011. 被引量:1
  • 10Vasily Volkov, James W Demmel LU. QR and cholesky factorizations using vector capabilities of GPUs [R]. UCB/EE-CS2008-49. Berkeley: University of California, 2008. 被引量:1

二级参考文献42

  • 1倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497. 被引量:15
  • 2刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 3Phillips S.Content management:The new data infrastructure-convergence and divergence through chaos[M].Merrill Lynch. 被引量:1
  • 4Gulli A,Signorini A.The indexable web is more than 11.5 billion pages[S/OL]//Special interest tracks and posters of the 14th international conference on World Wide Web.Chiba,Japan:ACM,2005: 902-903.http ://portal.acm.org/citation.cfm?id= 1062789. 被引量:1
  • 5Han Jiawei,Micheline K.Data mining:Concepts and techniques[M]. 2nd.[S.l.] : Morgan Kaufmann Publisher, 2006. 被引量:1
  • 6Hotho A.A brief survey of text mining[J].LDV Forum-GLDV Journal for Computational Linguistics and Language Technology,2005, 20( 1 ) : 19-62. 被引量:1
  • 7Steinbach M.A comparison of document clustering techniques[D].Department of Computer Science and Engineering,University of Minnesota, 2000. 被引量:1
  • 8MacQueen J B.Some methods for classification and analysis of multivariate observations[C]//Cam L M L,Neyman J.Proc of the fifth Berkeley Symposium on Mathematical Statistics and Probability.University of California Press,1967:281-297. 被引量:1
  • 9Dhillon I S,Modha D S.A data-clustering algorithm on distributed memory muhiprocessors[C]//Revised Papers from Large-Scale Parallel Data Mining,Workshop on Large-Scale Parallel KDD Systems. Springer-Verlag, 2000 : 245-260. 被引量:1
  • 10Kantabutra S,Couch A L.Parallel k-means clustering algorithm on nows[J].NECTEC Technical Journal, 2000, 1 (6) : 243-248. 被引量:1

共引文献71

同被引文献21

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部