期刊文献+

基于初始中心优化的遗传K-means聚类新算法 被引量:17

New genetic K-means clustering algorithm based on meliorated initial center
下载PDF
导出
摘要 一个好的K-means聚类算法至少要满足两个要求:(1)能反映聚类的有效性,即所分类别数要与实际问题相符;(2)具有处理噪声数据的能力。传统的K-means算法是一种局部搜索算法,存在着对初始化敏感和容易陷入局部极值的缺点。针对此缺点,提出了一种优化初始中心的K-means算法,该算法选择相距最远的处于高密度区域的k个数据对象作为初始聚类中心。实验表明该算法不仅具有对初始数据的弱依赖性,而且具有收敛快,聚类质量高的特点。为体现聚类的有效性,获得更高精度的聚类结果,提出了将优化的K-means算法(PKM)和遗传算法相结合的混合算法(PGKM),该算法在提高紧凑度(类内距)和分离度(类间距)的同时自动搜索最佳聚类数k,对k个初始中心优化后再聚类,不断地循环迭代,得到满足终止条件的最优聚类。实验证明该算法具有更好的聚类质量和综合性能。 A good K-means clustering algorithm should meet two requirements at least.First,it can reflect the validity of clustering,in other words,clustering number eonsistents with the practical problems.Second,it has the ability to handle the noise.The traditional K-means algorithm is a local search algorithin,which is sensitive to initialization and easy to search a local maximum. To address this shorteoming,a new K-means algorithin is proposed to optimize the initial center.The algorithin finds k data objects,all of which are belong to high density area and the most far away to each other.Experiments show that the algorithin has not only the weak dependence on initial data,but also fast convergence and high clustering quality.To realize the validity of clustering and get clustering results of higher accuracy,the paper proposes a hybrid algorithin,which combines the optimal K- means algorithm and the genetic algorithm.The algorithm can automatically get the optimal value of k with high compact clusters and large separation between at least two clusters,and optimal k initial center in order to get better clustering,then continue to search iteratively to get the optimal solution.Experiments show that the hybrid method has better clustering quality and general performance.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第23期166-168,182,共4页 Computer Engineering and Applications
基金 山东省自然科学基金重大项目(No.Z2004G02) 山东省中青年科学家奖励基金资助项目(No.03BS003) 山东教育厅科技计划项目(No.J05G01) "泰山学者"建设工程专项经费资助~~
关键词 聚类 K—means算法 遗传算法 clustering K-means algorithm genetic algorithm
  • 相关文献

参考文献14

  • 1毛国君等编著..数据挖掘原理与算法[M].北京:清华大学出版社,2005:314.
  • 2MacQueen J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967. 被引量:1
  • 3史忠植著..知识发现[M].北京:清华大学出版社,2002:402.
  • 4Wang Wei.Yang Jiong,Muntz R.STING:a statistical information grid approach to spatial data mining[C]//Proc of the 23rd International Conference on Very Large Data Bases,1997. 被引量:1
  • 5Pakhiraa M K,Bandyopadhyayb S I,JjwalMaulikc U.Validity index for crisp and fuzzy clusters[J].Pattern Rccognition,2004,37:487-501. 被引量:1
  • 6唐立新,杨自厚,王梦光.用遗传算法改进聚类分析中的K-平均算法[J].数理统计与应用概率,1997,12(4):350-356. 被引量:23
  • 7Agrawal R,Gehrke J,Gunopulcs D.Automatic subspaee clustering of high dimensional data for data mining application[C]//Proc of ACM SIGMOD Intconfon Management on Data,Seattle,WA,1998:94-205. 被引量:1
  • 8Bandyopadhyay S I,JjwalMaulik U.An evolutionary technique based on K-means algorithm for optimal clustering in RN[J].Information Sciences, 2002,146 : 221-237. 被引量:1
  • 9傅景广,许刚,王裕国.基于遗传算法的聚类分析[J].计算机工程,2004,30(4):122-124. 被引量:49
  • 10Guha S,Rastogi R,Shim K.Cure:an efficient clustering algorithm for large database[C]//Proc of ACM-SIGMOND Int Conf Management on Data,Seattle,Washington,1998:73-84. 被引量:1

二级参考文献14

  • 1AnsariN HouE 李军 边肇祺译.用于最优化的计算智能[M].北京:清华大学出版社,1999.. 被引量:2
  • 2HANJW KAMBRM.DataMiningConceptsandTechniques(影印本)[M].北京:高等教育出版社,2001.326-329. 被引量:1
  • 3WU YS, DING XQ. A new clustering method for Chinese character recognition system using artificial neural networks[J]. Chinese Journal of Electronics, 1993, 2(3):1-8. 被引量:1
  • 4MAULIK U, BANDYOPADHYAY S. Genetic Algorithm-based Clustering Technique[J]. Pattern Recognition, 2000, 33(9):1455-1465. 被引量:1
  • 5LIKAS A, VLASSIS N. The Global k-means clustering algorithm[J]. Pattern Recognition, 2003, 36(2):451-461. 被引量:1
  • 6LI J, GAO XB, JI HB. A feature weighted FCM clustering algorithm based on evolutionary strategy[A]. Proceedings of the 4th World Congress on Intelligent Control and Automation[C]. Shanghai, China, 2003.1540-1553. 被引量:1
  • 7Fisher RA. Iris Data[EB/OL]. http://www.gseis.ucla.edu/courses/data/iris, 2004. 被引量:1
  • 8Treshansky A,McGraw R.An overview of clustering algorithms[A].Proceedings of SPIE,The International Society for Optical Engineering[C].2001(4367):41-51. 被引量:1
  • 9Clausi D A.K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation[J].Pattern Recognition,2002,35:1959-1972. 被引量:1
  • 10Bezdek J C,Pal N R.Some new indexes of cluster validity[J].IEEE Transactions on Systems,Man,and Cybernetics _ Part B:Cybernetics,1998,28(3):301-315. 被引量:1

共引文献261

同被引文献131

引证文献17

二级引证文献86

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部