摘要
传统的K-Means聚类算法只能保证收敛到局部最优,从而导致聚类结果对初始代表点的选择非常敏感;凝聚层次聚类虽无需选择初始的聚类中心,但计算复杂度较高,而且凝聚过程不可逆。结合网络舆情的特点,深入剖析了K-Means聚类算法和凝聚层次聚类算法的优缺点,对K-Means聚类算法进行改进。改进后算法的核心思想是,结合两种算法分别在初始点选择和聚类过程两个方面的优势,进行整合优化。通过实验分析及实际应用表明,改进后的文本聚类算法在很大程度上可以提高网络舆情信息聚类结果的准确性、有效性以及算法的效率。
The traditional K-Means clustering algorithm can only ensure the convergence to a local optimum,leading to the initial clustering results are very sensitive to the choice of representative points.Agglomerative hierarchical clustering option to eliminate the initial cluster centers can be automatically generated for text set at different levels of clustering model,but it is higher in computational complexity,and irreversible aggregation.In this article,analysis deeply the advantages and disadvantages of the K-Means clustering algorithm and agglomerative hierarchical clustering algorithm according to the network characteristics of public opinion,and improving the K-Means clustering algorithm.The core idea of the improved algorithm is combining the advantages of two algorithms at the initial point selection and clustering processes,making integration optimization.Through practical application shows that the improved algorithm can improve the quality and efficiency of the network public opinion information and clustering results.
出处
《电脑开发与应用》
2010年第8期4-6,15,共4页
Computer Development & Applications
基金
山西人事厅资助项目(SX20090108-07)
关键词
网络舆情
文本聚类
K-MEANS算法
凝聚层次聚类
聚类过程
internet public opinion
text clustering
K-Means algorithm
hierarchical agglomerative clustering
clustering process