期刊文献+

基于类间差异最大化的加权距离改进K-means算法 被引量:2

An improved K-means algorithm by weighted distance based on the maximum between-cluster variation
原文传递
导出
摘要 为了改善K-means算法的聚类效果,将聚类准则函数定义为加权的类内误差平方总和SSE(sum of thesquared error),并调整了K-means算法迭代过程中重新分配数据对象的方法:使用一个带有类内数据对象数的加权距离作为重新分配数据对象的依据,同时按类间差异最大化为准则优化了加权距离中的参数。实验表明,改进后的K-means算法可以在很大程度上减少大类被拆分情况的发生,明显改善聚类效果。 To find natural clusters,the criterion function was improved by being defined as the weighted sum of the squared error.The way each point assigned to the centroid in the iteration of the K-means algorithm was also modified: each point was assigned to the centroid that had minimum weighted distance.The weight was related to the number of points in each cluster,and the parameter of weighted distance was optimized by maximizing the between-cluster variation.Experimental results showed that the improved K-means algorithm significantly enhanced the clustering quality by reducing the probability of larger cluster's being broken.
作者 张雪凤 刘鹏
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2010年第7期28-33,共6页 Journal of Shandong University(Natural Science)
基金 上海财经大学‘211工程’三期重点学科建设项目
关键词 K-MEANS算法 聚类 类间差异 加权距离 K-means algorithm clustering between-cluster variation weighted distance
  • 相关文献

参考文献14

  • 1SOMAN K P, DIWAKAR S, AJAY V. Insight into data mining: theory and practice[M]. India: Prentice Hall of India, 2006:209. 被引量:1
  • 2HAN J, KAMBER M. Data mining: concepts and techniques[M]. San Diego: Morgan Kaufmann Publishers, 2001:223. 被引量:1
  • 3MACQUEEN J B. Some methods for classification and analysis of multivariate observations [ C ]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, 1 : 281-297. 被引量:1
  • 4PELLEG D, MOORE A. X-means: extending K-means with efficient estimation of the number of clusters [ C ]//Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. , 2000:727-734. 被引量:1
  • 5BRADLEY P S, FAYYAD U M. Refining initial points for K-means clustering [ C ]//Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1998:91-99. 被引量:1
  • 6KHAN S S, AHMAD A. Cluster center initialization for K-means clustering [J].Pattern Recognition Letters, 2004, 25( 11 ) : 1293-1302. 被引量:1
  • 7DEELERS S, AUWATANAMONGKOL S. Enhancing K- means algorithm with Initial cluster centers derived from data partitioning along the data axis with the highest variance [ J ] International Journal of Computer Science, 2007, 2(4) :247-252. 被引量:1
  • 8PENA J M, LOZANO J A, LARRANAGA P. An empirical comparison of four initialization methods for the K- means algorithm [J]. Pattern Recognition Letters, 1999, 50 : 1027-1040. 被引量:1
  • 9LIKAS A, VLASSIS N, VERBEEK J J. The global K- means clustering algorithm [J]. Pattern Recognition,2003, 36(2) : 451-461. 被引量:1
  • 10BRADLEY P S, FAYYAD U, REINA C. Scaling clustering algorithms to large databases[C ]//Proceedings of the 4th ACM SIGKDD. New York:ACM Press, 1998: 9-15. 被引量:1

同被引文献12

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部