期刊文献+

基于新聚类有效性函数的改进K-means算法 被引量:4

Modified K-means algorithm based on new cluster validity index
下载PDF
导出
摘要 在K-means算法中,聚类数k是影响聚类质量的关键因素之一。目前,已经提出了许多确定最佳k值的聚类有效性方法,但这些方法都不能很好地处理两种数据集:类(簇)密度不同的数据集和类间距比较小的数据集(含有合并簇的数据集)。为此,提出了一种新的聚类有效性函数,该函数定义为数据特征轴总长度的平方与最小类间距的比值,最佳聚类数为这个比值达到最小时对应的k值。同时,为减小K-means算法对噪声和孤立点数据的敏感性,使用了基于加权的改进K-平均的方法计算类中心。实验证明,与其他算法相比,基于新聚类有效性函数的K-wmeans算法不仅降低了噪声和孤立点数据对聚类结果的影响,而且能有效地处理上面提到的两种数据集,明显提高了数据聚类质量。 The class number k is one of the key factors to influence cluster quality in K-means algorithm. Several cluster validity measures have been proposed for confirming the optimal k value. However, the existing methods may not work well for the following two kinds of data sets: the data set containing cluster groups with different densities and the data set in which the cluster groups are extremely close to each other. Therefore, a new cluster validity index was proposed. The index was defined as the ratio value between the squared total length of the data eigen-axes and the between-cluster separation ( the data set containing merged cluster group). If the value reaches the minimum, the clustering number is the optimal one. At the same time, in order to reduce the sensitivity of K-means algorithm to isolation point and noise, a K-wmeans clustering algorithm based on weights was put forward to calculate clustering centers. Experimental results show that the proposed algorithm gives more accurate resuhs than the other algorithm. A modified K-means algorithm based on a new cluster validity index not only reduces the impact of isolation point and noise but also effectively deals with the two kinds of data sets mentioned above, improving the quality of data clustering.
出处 《计算机应用》 CSCD 北大核心 2008年第12期3244-3247,共4页 journal of Computer Applications
基金 "泰山学者"建设工程专项经费资助 山东省自然科学基金重大项目(Z2004G02) 山东省中青年科学家奖励基金资助项目(03BS003) 山东教育厅科技计划项目(J05G01)
关键词 聚类 K-MEANS算法 聚类有效性 clustering k-means algorithm cluster validity
  • 相关文献

参考文献15

  • 1JAIN A K, MURTY M N, FLYNN P J. Data clustering: A review [J]. ACM Computing Surveys, 1999, 31(3):264-323. 被引量:1
  • 2GRABMEIER J, RUDOLPH A. Techniques of cluster algorithms in data mining[ J]. Data Mining and Knowledge Discovery, 2002, 6 (4): 303. 被引量:1
  • 3HAN J, KAMBER M. Data mining: Concepts and techniques [ M]. San Francisco: Morgan Kaufmann Publishers, 2000. 被引量:1
  • 4YEUNG K, HAYNOR D, RUZZO W. Validating clustering for gene expression data [ J]. Bioinformatics, 2001, 17(4) : 309 - 318. 被引量:1
  • 5MAULIK U, BANDYOPADHYAY S. Performance evaluation of some clustering algorithms and validity indices [ J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2002, 24(12) : 1650 - 1654. 被引量:1
  • 6杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:192
  • 7MACQUEEN J. Some methods for classification and analysis of multivariate observation [ C]//Proceeding of the 5th Berkeley Symposium on Mathematics, Statistics and Probability. California: University of California Press, 1967, 1:281 -297. 被引量:1
  • 8孙士保,秦克云.改进的k-平均聚类算法研究[J].计算机工程,2007,33(13):200-201. 被引量:50
  • 9KAUFMAN J, ROUSSEEUW P J. Finding groups in data: An introduction to cluster analysis [ M]. New York: John Wiley & Sons, 1990. 被引量:1
  • 10DUBES R, JAIN A. Validity studies in clustering methodologies [J]. Pattern Recognition, 1979, 11(1):235-254. 被引量:1

二级参考文献20

  • 1余建桥,张帆.基于数据场改进的PAM聚类算法[J].计算机科学,2005,32(1):165-167. 被引量:15
  • 2Treshansky A,McGraw R.An overview of clustering algorithms[A].Proceedings of SPIE,The International Society for Optical Engineering[C].2001(4367):41-51. 被引量:1
  • 3Clausi D A.K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation[J].Pattern Recognition,2002,35:1959-1972. 被引量:1
  • 4Bezdek J C,Pal N R.Some new indexes of cluster validity[J].IEEE Transactions on Systems,Man,and Cybernetics _ Part B:Cybernetics,1998,28(3):301-315. 被引量:1
  • 5Ramze R M,Lelieveldt B P F,Reiber J H C.A new cluster validity indexes for the fuzzy c-mean[J].Pattern Recognition Letters,1998,19:237-246. 被引量:1
  • 6Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].San Francisco:Morgan Kaufmann Publishers,2000. 被引量:1
  • 7Grabmeier J,Rudolph A.Techniques of Cluster Algorithms in Data Mining[J].Data Mining and Knowledge Discovery,2002,6(4):303. 被引量:1
  • 8Jain A K,Murty M N,Flynn P J.Data Clustering:A Review[J].ACM Computing Surveys,1999,31(3):264-323. 被引量:1
  • 9MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations[C]//Proc.of the 5th Berkeley Symp.on Math.Statist.1967:281-297. 被引量:1
  • 10Kaufman J,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York:John Wiley & Sons,1990. 被引量:1

共引文献237

同被引文献24

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部