期刊文献+

基于文本挖掘的聚类算法研究 被引量:7

Research of Clustering Algorithms Based on Text Mining
下载PDF
导出
摘要 随着网络中数据信息的大量积累,如何从海量文本数据中有效提取所需要的信息成为当前文本挖掘的重要内容。本文主要研究K-means和K-medoids两种聚类算法在文本挖掘中的应用,并通过实验利用基于人工判定的指标对两类算法在聚类文档的准确率和召回率方面进行了性能比较。实验结果表明,与K-means算法相比,K-medoids算法无论在准确率还是召回率方面都要高出5个百分点以上,且后者在处理异常数据和噪声数据方面更为鲁棒。 With the acceleration of massive data on Internet, how to extract information needed effectively has been become an important issue in text mining. This paper mainly studies the application of K-means algorithm and K-medoids algorithm in text mining. Experiments have been conducted to evaluate the performance of the algorithms in accuracy rate and the recall rate based on artificial appraisable standard. Experiment results show that K-medoids algorithm is 5 percent higher than K-means algorithm in terms of accuracy and the recall rate, and the former is more robust in dealing with abnormal and noise data.
出处 《微计算机信息》 2011年第2期168-169,65,共3页 Control & Automation
关键词 文本挖掘 K-MEANS K-medoids 准确率 召回率 text mining K-means K-medoids precision rate recall rate
  • 相关文献

参考文献6

  • 1焦慧,刘迁,王玉英,贾惠波.优化初始值的K均值中文文本聚类[J].微计算机信息,2009,25(21):142-144. 被引量:6
  • 2Silva, HB, Brito P, da Costa, JP. A partitional clustering algorithm validated by a clustering tendency index based on graph theory. [J].Pattern Recognition,2006,39(5). 被引量:1
  • 3Dash, M.,Liu, H..'1+1>2": merging distance and density based clustering[A].7th International Conference on Database Systems for Advanced Applications (DASFAA 2001)[C].2001. 被引量:1
  • 4Jain AK, Murty MN. Data clustering: A review. ACM Computing Surveys, 1999, 31(3):264-323. 被引量:1
  • 5Shu-Chuan Chu, john F.Roddick. Efficient search approaches of K-medoids based algorithms[A].2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, vol. 1 [C].2002. 被引量:1
  • 6Bjornar Larsen and Chinatsu A one, Fastand effective text mining using linear-time document clustering. In Proe.of the Fifth ACM SIGKDD Intl Conference on Knowledge Discovery and Data Mining, pages 16-22,1999. 被引量:1

二级参考文献7

共引文献5

同被引文献46

引证文献7

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部