期刊文献+

基于相似中心的k-cmeans文本聚类算法 被引量:12

k-cmeans text clustering algorithm based on similar centroid
下载PDF
导出
摘要 针对k-means聚类算法只能保证收敛到局部最优,导致聚类结果对初始聚类中心敏感的问题,提出了一种基于相似中心的文本聚类算法。首先,度量文档之间的相似性,然后按照文档之间的相似性递减排序,选择序列最前面的k个文档作为初始聚类中心,对于每个剩余的文档(没有被选为初始簇中心的文档)根据其与存在的簇中心的相似性,将其分配到相似性最大的簇中,更新簇均值,连续迭代,直至均值不变,从而得到更加稳定的聚类结果。实验结果表明,提出的算法在宏平均聚类精度和宏平均召回率上有显著提高,产生了质量较好的聚类效果。 The k-means clustering algorithm can only guarantee convergence to a local optimum, which led to the results of clustering is sensitive for initial clustering center, an improved centroid-based text clustering algorithm is proposed. First, the similarity between documents is calculated, then centers at the first k documents of the sequence is selected, which is sorted by similarity descending, according to similarity between every document which is not selected as initial cluster center and existent cluster center, assigned the document to a cluster having the largest similarity, updating cluster mean and iterating continuously until no change. Finally, the more stable clustering result is gotten. The comparison of experimental results show that the proposed algorithm performs is better in the marco average clustering precision and marco average recall rate, gets better quality of clustering results.
出处 《计算机工程与设计》 CSCD 北大核心 2010年第8期1802-1805,共4页 Computer Engineering and Design
基金 工信部2007电子信息产业发展基金项目(工信部运[2007]97号)
关键词 聚类 k-cmeans算法 相似性度量 宏平均聚类精度 宏平均召回率 clustering k-cmeans algorithm similarity measurement marco average clustering precision marco average recall rate
  • 相关文献

参考文献9

  • 1K.haled M Hammouda,Mohamed S Kamel.Efficient phrase-based document indexing for web document clustering[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(10):1279- 1296. 被引量:1
  • 2Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, et al. Automated variable weighting in k-means type clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668. 被引量:1
  • 3Shehroz S Khan,Amir Ahmad.A cluster center initialization algorithm for k-means clustering[J].Pattem Recognition Letters, 2004,25(11):1293-1302. 被引量:1
  • 4Ramiz M Aliguliyev.Clustering of document collection- a weighting approach [J]. Expert Systems with Applications, 2009,36(4) :7904-7916. 被引量:1
  • 5Tapas Kanungo,David M Mount,Nathan S Net-anyahu,et al.An efficient k-means clustering algorithm [J]. Analysis and Implementation,IEEE Transactions on Pattern Analysis and Machine InteUigence,2002,24(7):881-892. 被引量:1
  • 6Ajith Abraham, Swagatam Das, Amit Konar. Document clustering using differential evolution[C].Vancouver, BC:IEEE Congress on Evolutionary Computation,2006:1784-1791. 被引量:1
  • 7Richard Nock, Frank Nielsen.On weighting clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(8): 1223-1235. 被引量:1
  • 8李孝明,曹万华.文本信息检索的精确匹配模型[J].计算机科学,2004,31(9):100-102. 被引量:7
  • 9Slonim N,Tishby N.Document clustering using word clusters via the information bottleneck method[C].Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2000:208-215. 被引量:1

二级参考文献6

共引文献6

同被引文献111

引证文献12

二级引证文献95

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部