摘要
k-means算法是一种重要的聚类算法,在网络信息处理领域有着广泛的应用。由于k-means算法终止于一个局部最优状态,所以初始类中心点的选择会在很大程度上影响其聚类效果。针对k-means算法所存在的问题,构造了文本集合的相似度矩阵,基于平均相似度集合通过排序迭代优选出了初始中心点。实验表明此算法可以有效减少迭代次数并提高聚类精度,最终获得较好的聚类效果。
K-means is an important clustering algorithm. It is widely used in the field of Internet information processing technologies. Because K-means algorithm terminates at a local optimum state, -so the choice of the initial class center point to a great extent influences the clustering effects. For the existing problems of K-means algorithm, the text set similarity matrix is structured. Based on the mean similarity set, the initial center points with higher quality are computed by sorting and iterating the mean similarity. Experiments show that the method can effectively reduce the number of iterations and improve the clustering accuracies, and ultimately, achieve a better clustering results.
出处
《北京石油化工学院学报》
2011年第4期55-58,共4页
Journal of Beijing Institute of Petrochemical Technology
关键词
K均值
聚类
初始中心点
优化
K-means
clustering
initial center point
optimization