期刊文献+

文本聚类算法的质量评价 被引量:7

Quality Evaluation for Three Textual Document Clustering Algorithms
下载PDF
导出
摘要 利用标准的分类测试集合进行聚类质量的量化评价,选择了k-Means聚类算法、STC(后缀树聚类)算法和基于Ant的聚类算法进行了实验对比.实验结果分析表明,STC聚类算法由于在处理文本时充分考虑了文本的短语特性,其聚类效果较好;基于Ant的聚类算法的结果受参数输入的影响较大;在Ant聚类算法中引入文本特性可以提高聚类结果的质量. Textual document clustering huge textual document set. Clustering is one of the effective approaches Validation or Quality Evaluation to establish a classification instance of a techniques can be used to assess the efficiency and effectiveness of a clustering algorithm. This paper presents the quality evaluation criterions. Based on these criterions we take three typical textual document clustering algorithms for assessment with experiments. The comparison results show that STC(Suffix Tree Clustering) algorithm is better than k-Means and Ant-Based clustering algorithms. The better performance of STC algorithm comes from that it takes into account the linguistic property when processing the documents. Ant-Based clustering algorithm's performance variation is affected by the input variables. It is necessary to adopt linguistic properties to improve the Ant-Based text clustering's performance.
出处 《中国科学院研究生院学报》 CAS CSCD 2006年第5期640-646,共7页 Journal of the Graduate School of the Chinese Academy of Sciences
基金 国家科技部"国家重点实验室网上合作研究平台"项目(2003DEA5G0407)资助
关键词 文本聚类 质量评价 有效性验证 后缀树聚类 Ant—Based聚类 K-MEANS聚类 textual document clustering, quality evaluation, clustering validation, STC, Ant-Based clustering, k- Means clustering
  • 相关文献

参考文献15

  • 1吴斌..群体智能的研究及其在知识发现中的应用[D].中国科学院计算技术研究所,2002:
  • 2Massey L Evaluating quality of text clustering with ARTI. Proceedings of the International Joint Conference on Neural Networks, 2003, 2:20 - 24. 被引量:1
  • 3Michael Steinbach, George Karypis, Vipin Kumar. A comparison of document clustering technique. In: Marko Grobelnik( ed. ), KDD Workshop on Text Mining, Boston, 2000. http ://www. cs. cmu. edu/- duuja/KDDpapers/Steinbaeh_IR.pdf. 被引量:1
  • 4Sergio M. Savaresi, Daniel L. Boley. On the performance of bisecting k-Means and PDDP. In: Robert Grossman (ed.) ,First Siam International Conference on Data Mining. Chicago, 2001.1 - 14. 被引量:1
  • 5Oren Zamir , Oren Etzioni, Omid Madani, et al. Fast and intuitive clustering of web documents. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.1997, Newport Beach, California, 1997.287 - 290. 被引量:1
  • 6David Weiss, Jerzy Stefanowski. Carrot2 search engine. 2002. http ://www. es. put. poznan, pl/dweiss/earrot/ 被引量:1
  • 7Deneubourg JL,Goss S,Franks N, et al. The dynamics of eollective sorting: Robot-like ants and ant-like robots, from animals to animates. In:Proceedings of the First International Conference on Simulation of Adaptive Behavior. MA:MIT Press, 1991.356 - 363. 被引量:1
  • 8Wu B, Zheng Y, Liu SH, et al. CSIM: a document clustering algorithm based on swarm intelligencc evolutionary computation. In: 2002 World Congress on Computational Intelligence, Honolulu 2002.877 - 882. 被引量:1
  • 9Julia Handl, Bernd Meyer. Improved ant-based clustering and sorting in a document retrieval interface. In : Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature (PPSN. VII), 2002,2439:913 - 923. 被引量:1
  • 10Kok Meng Hoe, Weng Kin Lai, Tracy SY Tai. Homogenous ants for web document similarity modeling and categorization. In:Proceedings of the Third International Workshop on Ant Algorithms, LNCS, 2002, 2463 : 256 - 261. 被引量:1

同被引文献61

引证文献7

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部