摘要
k-means算法以其算法简单、计算效率高而被广泛应用在数据挖掘、机器学习、计算机视觉等领域。然而,k-means算法的性能严重依赖于其初始聚类中心的选取。不同的初始聚类中心导致k-means算法的聚类结果变化很大。一个合理的方式是选取处在数据相对密集区域的数据样本作为初始聚类中心。鉴于此,提出一种基于数据近邻图的k-means初始中心选取算法。该算法分为三个阶段:1)构建数据集的局部近邻图;2)选取初始聚类中心的候选集合;3)确定恰当的初始聚类中心。实验结果表明,该算法选取的初始聚类中心是合理的,同时,可以加快k-means的收敛速度。
K-means clustering algorithm is widely used in the fields of data mining,machine learning and computer vision for its conceptually simplicity and high computation efficiency. However,its performance severely relies on the initial clustering centre selection. Differentinitial clustering centre results in the clustering results of k-means algorithm sharply varying. A reasonable solution is to choose the data sample in the region with relative dense data as the initial clustering centre. In view of this,we propose a data neighbourhood graph-basedinitial centre selection method for k-means algorithm,which takes three steps. The first step is to construct the neighbourhood graph of the dataset. The second step is to choose candidates collection of initial clustering centres. The last step is to decide appropriate initialclustering centre. Experimental results show that the initial clustering centre chosen by the proposed method is reasonable,and can speed up the convergence of k-means at the same time.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第4期178-181,192,共5页
Computer Applications and Software
关键词
聚类
K均值
初始化
近邻图
Clustering k-means Initialisation Neighbourhood graph