摘要
一个好的K-means聚类算法至少要满足两个要求:(1)能反映聚类的有效性,即所分类别数要与实际问题相符;(2)具有处理噪声数据的能力。传统的K-means算法是一种局部搜索算法,存在着对初始化敏感和容易陷入局部极值的缺点。针对此缺点,提出了一种优化初始中心的K-means算法,该算法选择相距最远的处于高密度区域的k个数据对象作为初始聚类中心。实验表明该算法不仅具有对初始数据的弱依赖性,而且具有收敛快,聚类质量高的特点。为体现聚类的有效性,获得更高精度的聚类结果,提出了将优化的K-means算法(PKM)和遗传算法相结合的混合算法(PGKM),该算法在提高紧凑度(类内距)和分离度(类间距)的同时自动搜索最佳聚类数k,对k个初始中心优化后再聚类,不断地循环迭代,得到满足终止条件的最优聚类。实验证明该算法具有更好的聚类质量和综合性能。
A good K-means clustering algorithm should meet two requirements at least.First,it can reflect the validity of clustering,in other words,clustering number eonsistents with the practical problems.Second,it has the ability to handle the noise.The traditional K-means algorithm is a local search algorithin,which is sensitive to initialization and easy to search a local maximum. To address this shorteoming,a new K-means algorithin is proposed to optimize the initial center.The algorithin finds k data objects,all of which are belong to high density area and the most far away to each other.Experiments show that the algorithin has not only the weak dependence on initial data,but also fast convergence and high clustering quality.To realize the validity of clustering and get clustering results of higher accuracy,the paper proposes a hybrid algorithin,which combines the optimal K- means algorithm and the genetic algorithm.The algorithm can automatically get the optimal value of k with high compact clusters and large separation between at least two clusters,and optimal k initial center in order to get better clustering,then continue to search iteratively to get the optimal solution.Experiments show that the hybrid method has better clustering quality and general performance.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第23期166-168,182,共4页
Computer Engineering and Applications
基金
山东省自然科学基金重大项目(No.Z2004G02)
山东省中青年科学家奖励基金资助项目(No.03BS003)
山东教育厅科技计划项目(No.J05G01)
"泰山学者"建设工程专项经费资助~~