期刊文献+

改进的K-means算法在网络舆情分析中的应用 被引量:7

Application of Improved K-Means Algorithm to Analysis of Online Public Opinions
下载PDF
导出
摘要 结合网络舆情分析的应用需求背景,首先介绍了文本信息的处理,然后探讨了文本聚类中的K-means算法,针对其对初始聚类中心的依赖性的特点,对算法加以改进。基于文档标题能够代表文档内容的思想,改进算法采用稀疏特征向量表示文本标题,计算标题间的稀疏相似度,确定初始聚类中心。最后实验证明改进的K-means算法提高了聚类的准确度;与基于最大最小距离原则的初始中心选择算法比较,提高了执行效率,同时保证了聚类准确度。 Combining background application requirement of online public opinion analysis,this paper firstly introduces the processing of text information,and then discusses the K-means algorithm of the text clustering,according to its characteristic that clustering results depend on the centers of initial clustering,and improves it.Based on the thought that text title can express its content,the improved algorithm uses sparse character vector to express text title,calculates the sparse similarity of them and ascertains the centers of initial clustering.The experiments show that the method improves the clustering accuracy.Compared with another algorithm based on the principle of maximum and minimum distance,the improved method heightens the efficiency and ensures the clustering accuracy.
出处 《计算机系统应用》 2011年第3期165-168,196,共5页 Computer Systems & Applications
关键词 网络舆情 K-MEANS算法 文本聚类 稀疏特征向量 online public opinion K-means clustering algorithm text clustering sparse character vector
  • 相关文献

参考文献10

  • 1Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. Pattern Recognition, 2003,36(2):451. 被引量:1
  • 2李凡,林爱武,陈国社.一种基于VSM文本分类系统的设计与实现[J].华中科技大学学报(自然科学版),2005,33(3):53-55. 被引量:19
  • 3MacQueen J. Some methods for classification and analysis of multivariate observations. Proc. of the 5th Berkeley Symp. on Mathematics Statistic Problem, 1967:281 -297. 被引量:1
  • 4Dhillon IS, Modha DS.Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42(1):143- 175. 被引量:1
  • 5Salton G. Wong A, Yang CS. A vector space model for automatic indexing. Communications of ACM, 1975,18(5): 613-620. 被引量:1
  • 6Bun KK. Topic Extraction from News Archive Using TF*PDT Algorithm. Proceedings of the 3rd International Conference on Web Information Systems Engineering. 2002. 被引量:1
  • 7赵亚琴,邹红艳.基于信息粒度的文本聚类算法[J].计算机工程与设计,2009,30(22):5171-5174. 被引量:2
  • 8Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques Proceeding of the 6th ACM-SIGKDD International Conference on Text Mining, Boston,MA,USA: ACM Press, 2000:103-122. 被引量:1
  • 9张睿..基于k-means的中文文本聚类算法的研究与实现[D].西北大学,2009:
  • 10Steinbach M, Karypis G, Kumara V. A Comparison of Document Clustering Techniques. KDD-2000 Workshop on Text Mining, Boston MA, August 20-23, 2000:109- 110. 被引量:1

二级参考文献11

共引文献19

同被引文献70

引证文献7

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部