基于机器学习的文本聚类描述算法研究被引量：1

Document Clustering Description Algorithm Based on Machine Learning

下载PDF

导出

摘要传统的聚类算法直接用于文本聚类这一应用上,存在的突出问题就是传统的聚类算法只负责将对象进行聚类,不负责对聚类后生成的类簇进行概念描述和解释。标注文本集合聚类后生成的类簇被称为聚类描述问题。聚类描述可以帮助用户迅速确认生成的文档类别与其需求是否相关,它是文本聚类应用中一项重要并富有挑战性的任务。针对文本聚类结果可读性较弱问题,本文提出了一种增强聚类结果的可理解性与可读性的算法,即基于支持向量机的文本聚类结果描述算法。实验结果表明基于支持向量机的聚类描述算法所取得的效果要优于常规的聚类结果描述方法。 Clustering description problem is one of the key issues of the traditional clustering algorithm in the applications of document clustering.The algorithm can cluster the objects,but it can not give concept description for the clustered results. Document clustering description is a problem of labeling the clustered results of document collection clustering.It can help users determine whether one of the clusters is relevant to users＇ information require.Therefore,labeling a clustered set of documents is an important and challenging work in document clustering applications.To resolve the problem of the weak readability of the traditional documents clustering results,a method of automatic labeling documents clusters based on machine learning is put forward.Experimental results show that the method based on SVM will provide users more concise and comprehensive documents clustering results.

作者章成志

机构地区南京理工大学经济管理学院信息管理系中国科学技术信息研究所

出处《情报学报》 CSSCI 北大核心 2009年第2期225-232,共8页 Journal of the China Society for Scientific and Technical Information

基金 “十一五”国家科技支撑计划重点项目（2006BAH03B02）南京理工大学青年科研扶持基金项目（JGQN0701）南京理工大学科研启动基金项目（AB41123） 2006年江苏省研究生培养创新工程项目资助。

关键词聚类描述文本聚类支持向量机机器学习 clustering description document clustering support vector machine machine learning

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] I0 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献20

1Glenisson P,Gl??nzel W,Janssens F,et al.Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines[J].Information Processing & Management,2005,41(6):1548-1572. 被引量：1
2Lai K K,Wu S J.Using the Patent Co-citation Approach to Establish a New Patent Classification System[J].Information Processing & Management,2005,41(2):313-330. 被引量：1
3Tseng Y H,Lin C J,Chen H H,Lin et al.Toward Generic Title Generation for Clustered Documents[C].Proceedings of the 3rd Asia Information Retrieval Symposium,Singapore,2006:145-157. 被引量：1
4Cutting D R,Karger D R,Pedersen J O,et al.Scatter/Gather:A Cluster-based Approach to Browsing Large Document Collections[C].Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval,Copenhagen,Denmark,1992:318-329. 被引量：1
5Cutting D R,Karger D R,Pedersen J O.Constant Interaction-time Scatter/Gather Browsing of Large Document Collections[C].Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval,Pittsburgh,PN,USA,1993:126-135. 被引量：1
6Muller A,Dorre J,Gerstl P,et al.The TaxGen Framework:Automating the Generation of a Taxonomy for a Large Document Collection[C].Proceedings of the 32nd Hawaii International Conference on System Sciences,Maui,HI,USA,1999:2034-2042. 被引量：1
7Anton V L,Croft W B.An Evaluation of Techniques for Clustering Search Results.Technical Report IR-76[R],Department of Computer Science,University of Massachusetts,Amherst,1996:1-19. 被引量：1
8Zamir O,Etzioni O.Web Document Clustering:A Feasibility Demonstration[C].Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval,Melbourne,Australia,1998:46-54. 被引量：1
9Lawrie D,Croft W B,Rosenberg A L.Finding Topic Words for Hierarchical Summarization[C].Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,New Orlean,LA,USA,2001:249-357. 被引量：1
10Glover E,Pennock D M,Lawrence S,et al.Inferring Hierarchical Descirptions[C].Proceedings of the 11th International Conference on Information and Knowledge Management,McLean,VA,2002:4-9. 被引量：1

共引文献208

1钱林安.灰色预测在乍嘉苏高速公路动态监测中的应用[J].公路,2004,49(7):56-59. 被引量：5
2赵欣,赵雯.边坡稳定可靠度分析[J].安徽建筑,2004,11(4):100-101. 被引量：1
3邵东国,李玮,刘丙军,王忠静.抬高水库汛限水位的洪水资源化利用研究[J].中国农村水利水电,2004(9):26-29. 被引量：17
4别爱芳,王立荣,夏静,潘志坚,谢绪权.“九五”期间PetroChina老区原油产能建设成效分析[J].石油勘探与开发,2004,31(3):97-100. 被引量：6
5王军乐,汪幸.用双毛细管粘度仪测定PET切片特性粘度的研究[J].合成技术及应用,2003,18(4):55-57. 被引量：1
6邢晖,朱震,徐代升,姚梅.空中光电跟踪平台的引导精度分析[J].电光与控制,2004,11(4):27-29. 被引量：2
7张志欣,江文胜,于非,冯艳春.垂向一维悬浮物浓度模式[J].海洋科学进展,2004,22(4):465-471. 被引量：1
8张健.异常电力负荷数据的t检验辨识与修正[J].电力需求侧管理,2005,7(1):12-14. 被引量：3
9陈兴长,陈慧.应用静力触探试验确定粉质粘土承载力的研究[J].四川建筑,2005,25(1):62-64. 被引量：2
10杨海波,姚庆栋,荆仁杰.基于团块匹配的序列图像中运动目标的分割方法[J].浙江大学学报（工学版）,2001,35(4):365-369. 被引量：2

同被引文献11

1FAHIM A.M,SALEM A.M,TORKEY F.A,RAMADAN M.A.An efficient enhanced k-means clustering algorithm[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1626-1633. 被引量：30
2Datta Souptik, Giannella Chris, Kargupta Hillol, et al. Approximate Dis- tributed K-Means Clustering over a Peer-to-Peer Network [ J ]. IEEE Transactions on Knowledge and Data Engineering,2009,21 (10) :1372 -1388. 被引量：1
3Ajit K Sahoo, Ming J Zuo, MK Tiwari, et al. A data clustering algorithm for stratified data partitioning in artificial neural network [ J ]. Expert Systems with Application ,2012,39( 8 ) :7004-7014. 被引量：1
4Gong M, Liang Y, Shi J, et al. Fuzzy C-Means Clustering With Local In- formation and Kernel Metric for Image Segmentation [ J ]. IEEE Trans- actions on ImageProcessing,2013,22(2) :573 -58d. 被引量：1
5Zhang L J, Cheng S, Chang C K, et al. A Pattern-Recognition-Based Al- gorithm and Case Study for Clustering and Selecting Business Services [ J ]. IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans,2012,42 ( 1 ) : 102 - 114. 被引量：1
6Ali Peiravi, Habib Rajabi Mashhadi, SHamed Javadi, et al. An optimal energy:efficient clustering method in wireless sensor networks using multi-objective genetic algorithm [ J ]. International journal of communi- cation systems, 2013,26( 1 ) :114 - 126. 被引量：1
7窦全胜,史忠植,姜平,李国江.基于聚度的PSO参数分析[J].计算机科学,2011,38(10):181-183. 被引量：2
8邓辉,王勇,陈士亮.AFSA与改进FSOA相结合的优化方法[J].计算机工程与应用,2011,47(31):57-62. 被引量：4
9钟蔚,李天友.基于自适应人工鱼群算法的输电网网架规划[J].电气技术,2012,13(1):16-19. 被引量：2
10邓涛,姚宏,杜军.多峰函数优化的免疫人工鱼群网络算法[J].系统工程与电子技术,2013,35(2):452-456. 被引量：8

引证文献1

1吕少娟,张桂珠.一种融合K-means算法和人工鱼群算法的聚类方法[J].计算机应用与软件,2015,32(9):240-243. 被引量：10

二级引证文献10

1王晓东,张姣,薛红.基于蝙蝠算法的K均值聚类算法[J].吉林大学学报（信息科学版）,2016,34(6):805-810. 被引量：5
2刘一鸥.基于人工鱼群算法的图书馆推荐平台设计[J].电子设计工程,2017,25(15):6-8. 被引量：4
3杨菊蜻,张达敏.基于改进BA算法的K-means聚类[J].计算机应用研究,2018,35(5):1454-1457. 被引量：5
4汤文亮,张平,汤树芳.基于精英反向学习的萤火虫k-means改进算法[J].计算机工程与设计,2019,40(11):3164-3169. 被引量：10
5刘高峰,杨洋.一种基于信息熵的人工鱼群聚类方法[J].内江科技,2019,40(11):17-19.
6刘旭光,宋万干.基于人工鱼群改进聚类算法的设备故障诊断[J].常州工学院学报,2021,34(3):30-34. 被引量：1
7朱代武,刘豪.基于K-means与人工鱼群融合的无人机协同安全研究[J].航空计算技术,2021,51(5):33-37. 被引量：2
8姜欣悦.基于图像识别的单株成熟柑橘树产量预估研究[J].南方农机,2022,53(23):66-68. 被引量：3
9程宁,戴远泉.基于核协方差矩阵的无监督数据聚类[J].计算机应用与软件,2023,40(5):288-296.
10张姣,王晓东,薛红.基于花粉算法的K均值聚类算法[J].纺织高校基础科学学报,2016,29(4). 被引量：4

1章成志.一种基于组合策略的聚类描述方法及其应用[J].情报科学,2009,27(7):1079-1084.
2柴世红,康正军.基于模糊聚类的网站用户分类[J].甘肃科技,2008,24(3):20-22. 被引量：5
3秦福高.一种基于遗传算法改进的蚁群聚类算法[J].福建电脑,2014,30(6):96-98.
4顾榕,王小平,曹立明.一种基于潜在语义分析的查询扩展算法[J].计算机工程与应用,2004,40(18):23-25. 被引量：8
5赵立江.聚类分析在个性化学习中的研究与应用[J].福建电脑,2006(12):13-13. 被引量：2
6战玉彩,刘希玉.基于层次聚类的分类挖掘[J].网络安全技术与应用,2013(1):54-55. 被引量：1
7徐静,蔡琼,喻俊杰.基于模糊聚类的Web日志挖掘的应用研究[J].电脑知识与技术,2006(7):53-54. 被引量：1
8苗建新,吉根林.GML文档结构聚类算法Clu-GML[J].南京大学学报（自然科学版）,2008,44(2):188-194. 被引量：8
9王强,王晓龙,关毅,徐志明.K-NN与SVM相融合的文本分类技术研究[J].高技术通讯,2005,15(5):19-24. 被引量：10
10徐伯庆,孙国强.基于聚类描述的培养算子与TSP求解[J].上海理工大学学报,2000,22(3):212-216. 被引量：1

情报学报

2009年第2期

浏览历史

内容加载中请稍等...

基于机器学习的文本聚类描述算法研究被引量：1

参考文献20

共引文献208

同被引文献11

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于机器学习的文本聚类描述算法研究 被引量：1

参考文献20

共引文献208

同被引文献11

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于机器学习的文本聚类描述算法研究被引量：1