期刊文献+

基于机器学习的文本聚类描述算法研究 被引量:1

Document Clustering Description Algorithm Based on Machine Learning
下载PDF
导出
摘要 传统的聚类算法直接用于文本聚类这一应用上,存在的突出问题就是传统的聚类算法只负责将对象进行聚类,不负责对聚类后生成的类簇进行概念描述和解释。标注文本集合聚类后生成的类簇被称为聚类描述问题。聚类描述可以帮助用户迅速确认生成的文档类别与其需求是否相关,它是文本聚类应用中一项重要并富有挑战性的任务。针对文本聚类结果可读性较弱问题,本文提出了一种增强聚类结果的可理解性与可读性的算法,即基于支持向量机的文本聚类结果描述算法。实验结果表明基于支持向量机的聚类描述算法所取得的效果要优于常规的聚类结果描述方法。 Clustering description problem is one of the key issues of the traditional clustering algorithm in the applications of document clustering.The algorithm can cluster the objects,but it can not give concept description for the clustered results. Document clustering description is a problem of labeling the clustered results of document collection clustering.It can help users determine whether one of the clusters is relevant to users' information require.Therefore,labeling a clustered set of documents is an important and challenging work in document clustering applications.To resolve the problem of the weak readability of the traditional documents clustering results,a method of automatic labeling documents clusters based on machine learning is put forward.Experimental results show that the method based on SVM will provide users more concise and comprehensive documents clustering results.
作者 章成志
出处 《情报学报》 CSSCI 北大核心 2009年第2期225-232,共8页 Journal of the China Society for Scientific and Technical Information
基金 “十一五”国家科技支撑计划重点项目(2006BAH03B02) 南京理工大学青年科研扶持基金项目(JGQN0701) 南京理工大学科研启动基金项目(AB41123) 2006年江苏省研究生培养创新工程项目资助。
关键词 聚类描述 文本聚类 支持向量机 机器学习 clustering description document clustering support vector machine machine learning
  • 相关文献

参考文献20

  • 1Glenisson P,Gl??nzel W,Janssens F,et al.Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines[J].Information Processing & Management,2005,41(6):1548-1572. 被引量:1
  • 2Lai K K,Wu S J.Using the Patent Co-citation Approach to Establish a New Patent Classification System[J].Information Processing & Management,2005,41(2):313-330. 被引量:1
  • 3Tseng Y H,Lin C J,Chen H H,Lin et al.Toward Generic Title Generation for Clustered Documents[C].Proceedings of the 3rd Asia Information Retrieval Symposium,Singapore,2006:145-157. 被引量:1
  • 4Cutting D R,Karger D R,Pedersen J O,et al.Scatter/Gather:A Cluster-based Approach to Browsing Large Document Collections[C].Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval,Copenhagen,Denmark,1992:318-329. 被引量:1
  • 5Cutting D R,Karger D R,Pedersen J O.Constant Interaction-time Scatter/Gather Browsing of Large Document Collections[C].Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval,Pittsburgh,PN,USA,1993:126-135. 被引量:1
  • 6Muller A,Dorre J,Gerstl P,et al.The TaxGen Framework:Automating the Generation of a Taxonomy for a Large Document Collection[C].Proceedings of the 32nd Hawaii International Conference on System Sciences,Maui,HI,USA,1999:2034-2042. 被引量:1
  • 7Anton V L,Croft W B.An Evaluation of Techniques for Clustering Search Results.Technical Report IR-76[R],Department of Computer Science,University of Massachusetts,Amherst,1996:1-19. 被引量:1
  • 8Zamir O,Etzioni O.Web Document Clustering:A Feasibility Demonstration[C].Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval,Melbourne,Australia,1998:46-54. 被引量:1
  • 9Lawrie D,Croft W B,Rosenberg A L.Finding Topic Words for Hierarchical Summarization[C].Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,New Orlean,LA,USA,2001:249-357. 被引量:1
  • 10Glover E,Pennock D M,Lawrence S,et al.Inferring Hierarchical Descirptions[C].Proceedings of the 11th International Conference on Information and Knowledge Management,McLean,VA,2002:4-9. 被引量:1

共引文献208

同被引文献11

  • 1FAHIM A.M,SALEM A.M,TORKEY F.A,RAMADAN M.A.An efficient enhanced k-means clustering algorithm[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1626-1633. 被引量:30
  • 2Datta Souptik, Giannella Chris, Kargupta Hillol, et al. Approximate Dis- tributed K-Means Clustering over a Peer-to-Peer Network [ J ]. IEEE Transactions on Knowledge and Data Engineering,2009,21 (10) :1372 -1388. 被引量:1
  • 3Ajit K Sahoo, Ming J Zuo, MK Tiwari, et al. A data clustering algorithm for stratified data partitioning in artificial neural network [ J ]. Expert Systems with Application ,2012,39( 8 ) :7004-7014. 被引量:1
  • 4Gong M, Liang Y, Shi J, et al. Fuzzy C-Means Clustering With Local In- formation and Kernel Metric for Image Segmentation [ J ]. IEEE Trans- actions on ImageProcessing,2013,22(2) :573 -58d. 被引量:1
  • 5Zhang L J, Cheng S, Chang C K, et al. A Pattern-Recognition-Based Al- gorithm and Case Study for Clustering and Selecting Business Services [ J ]. IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans,2012,42 ( 1 ) : 102 - 114. 被引量:1
  • 6Ali Peiravi, Habib Rajabi Mashhadi, SHamed Javadi, et al. An optimal energy:efficient clustering method in wireless sensor networks using multi-objective genetic algorithm [ J ]. International journal of communi- cation systems, 2013,26( 1 ) :114 - 126. 被引量:1
  • 7窦全胜,史忠植,姜平,李国江.基于聚度的PSO参数分析[J].计算机科学,2011,38(10):181-183. 被引量:2
  • 8邓辉,王勇,陈士亮.AFSA与改进FSOA相结合的优化方法[J].计算机工程与应用,2011,47(31):57-62. 被引量:4
  • 9钟蔚,李天友.基于自适应人工鱼群算法的输电网网架规划[J].电气技术,2012,13(1):16-19. 被引量:2
  • 10邓涛,姚宏,杜军.多峰函数优化的免疫人工鱼群网络算法[J].系统工程与电子技术,2013,35(2):452-456. 被引量:8

引证文献1

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部