基于新聚类有效性函数的改进K-means算法被引量：4

Modified K-means algorithm based on new cluster validity index

下载PDF

导出

摘要在K-means算法中,聚类数k是影响聚类质量的关键因素之一。目前,已经提出了许多确定最佳k值的聚类有效性方法,但这些方法都不能很好地处理两种数据集:类(簇)密度不同的数据集和类间距比较小的数据集(含有合并簇的数据集)。为此,提出了一种新的聚类有效性函数,该函数定义为数据特征轴总长度的平方与最小类间距的比值,最佳聚类数为这个比值达到最小时对应的k值。同时,为减小K-means算法对噪声和孤立点数据的敏感性,使用了基于加权的改进K-平均的方法计算类中心。实验证明,与其他算法相比,基于新聚类有效性函数的K-wmeans算法不仅降低了噪声和孤立点数据对聚类结果的影响,而且能有效地处理上面提到的两种数据集,明显提高了数据聚类质量。 The class number k is one of the key factors to influence cluster quality in K-means algorithm. Several cluster validity measures have been proposed for confirming the optimal k value. However, the existing methods may not work well for the following two kinds of data sets： the data set containing cluster groups with different densities and the data set in which the cluster groups are extremely close to each other. Therefore, a new cluster validity index was proposed. The index was defined as the ratio value between the squared total length of the data eigen-axes and the between-cluster separation （ the data set containing merged cluster group）. If the value reaches the minimum, the clustering number is the optimal one. At the same time, in order to reduce the sensitivity of K-means algorithm to isolation point and noise, a K-wmeans clustering algorithm based on weights was put forward to calculate clustering centers. Experimental results show that the proposed algorithm gives more accurate resuhs than the other algorithm. A modified K-means algorithm based on a new cluster validity index not only reduces the impact of isolation point and noise but also effectively deals with the two kinds of data sets mentioned above, improving the quality of data clustering.

作者孙秀娟刘希玉

机构地区山东师范大学信息科学与工程学院山东师范大学管理学院

出处《计算机应用》 CSCD 北大核心 2008年第12期3244-3247,共4页 journal of Computer Applications

基金 "泰山学者"建设工程专项经费资助山东省自然科学基金重大项目(Z2004G02) 山东省中青年科学家奖励基金资助项目(03BS003) 山东教育厅科技计划项目(J05G01)

关键词聚类 K-MEANS算法聚类有效性 clustering k-means algorithm cluster validity

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献15

1JAIN A K, MURTY M N, FLYNN P J. Data clustering: A review [J]. ACM Computing Surveys, 1999, 31(3):264-323. 被引量：1
2GRABMEIER J, RUDOLPH A. Techniques of cluster algorithms in data mining[ J]. Data Mining and Knowledge Discovery, 2002, 6 (4): 303. 被引量：1
3HAN J, KAMBER M. Data mining: Concepts and techniques [ M]. San Francisco: Morgan Kaufmann Publishers, 2000. 被引量：1
4YEUNG K, HAYNOR D, RUZZO W. Validating clustering for gene expression data [ J]. Bioinformatics, 2001, 17(4) : 309 - 318. 被引量：1
5MAULIK U, BANDYOPADHYAY S. Performance evaluation of some clustering algorithms and validity indices [ J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2002, 24(12) : 1650 - 1654. 被引量：1
6杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量：192
7MACQUEEN J. Some methods for classification and analysis of multivariate observation [ C]//Proceeding of the 5th Berkeley Symposium on Mathematics, Statistics and Probability. California: University of California Press, 1967, 1:281 -297. 被引量：1
8孙士保,秦克云.改进的k-平均聚类算法研究[J].计算机工程,2007,33(13):200-201. 被引量：50
9KAUFMAN J, ROUSSEEUW P J. Finding groups in data: An introduction to cluster analysis [ M]. New York: John Wiley & Sons, 1990. 被引量：1
10DUBES R, JAIN A. Validity studies in clustering methodologies [J]. Pattern Recognition, 1979, 11(1):235-254. 被引量：1

二级参考文献20

1余建桥,张帆.基于数据场改进的PAM聚类算法[J].计算机科学,2005,32(1):165-167. 被引量：15
2Treshansky A,McGraw R.An overview of clustering algorithms[A].Proceedings of SPIE,The International Society for Optical Engineering[C].2001(4367):41-51. 被引量：1
3Clausi D A.K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation[J].Pattern Recognition,2002,35:1959-1972. 被引量：1
4Bezdek J C,Pal N R.Some new indexes of cluster validity[J].IEEE Transactions on Systems,Man,and Cybernetics _ Part B:Cybernetics,1998,28(3):301-315. 被引量：1
5Ramze R M,Lelieveldt B P F,Reiber J H C.A new cluster validity indexes for the fuzzy c-mean[J].Pattern Recognition Letters,1998,19:237-246. 被引量：1
6Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].San Francisco:Morgan Kaufmann Publishers,2000. 被引量：1
7Grabmeier J,Rudolph A.Techniques of Cluster Algorithms in Data Mining[J].Data Mining and Knowledge Discovery,2002,6(4):303. 被引量：1
8Jain A K,Murty M N,Flynn P J.Data Clustering:A Review[J].ACM Computing Surveys,1999,31(3):264-323. 被引量：1
9MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations[C]//Proc.of the 5th Berkeley Symp.on Math.Statist.1967:281-297. 被引量：1
10Kaufman J,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York:John Wiley & Sons,1990. 被引量：1

共引文献237

1段桂芹.基于改进密度的簇内均值最小距离聚类算法[J].智能计算机与应用,2021,11(12):82-86. 被引量：1
2刘文一,孙伟,朱良明,赵志博.舰载飞行器打击水面舰艇编队队形识别和目标选择方法[J].兵器装备工程学报,2020,41(2):85-89. 被引量：11
3刘婷,郭海湘,诸克军,高思维.一种改进的遗传k-means聚类算法[J].数学的实践与认识,2007,37(8):104-111. 被引量：23
4楼佳,王小华.一种分裂式的k-means聚类算法[J].杭州电子科技大学学报（自然科学版）,2009,29(4):54-57. 被引量：1
5韩丽苹,孟海东,李海荣.聚类算法在矿产资源与经济发展关系研究中的应用[J].煤炭技术,2015,34(5):290-292.
6李桃迎,陈燕.一种改进FCM的快速优化算法及其应用[J].大连海事大学学报,2006,32(4):23-27. 被引量：4
7郭海湘,诸克军,李玥,王得运.软计算与硬计算融合的中国石油需求预测[J].中国地质大学学报（社会科学版）,2007,7(6):24-28. 被引量：2
8孙薇,张省.基于半监督支持向量机的供电企业安全性评价[J].电气应用,2008,27(1):57-60. 被引量：1
9刘茵,李弼程,郭映月.一种基于聚类算法的主旨句提取方法[J].情报学报,2008,27(1):49-55. 被引量：1
10李卫平.K-Means聚类算法研究[J].中国西部科技,2008,7(8):52-53. 被引量：11

同被引文献24

1蔡竞峰,John,Durkin,蔡清波.数据挖掘的机遇、应用和发展战略[J].计算机科学,2002,29(z1):225-228. 被引量：3
2周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报（自然科学版）,2012,40(S1):150-152. 被引量：32
3江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报（自然科学版）,2011,39(S1):120-124. 被引量：79
4伍青生,余颖,郑兴山.精准营销的思想和方法[J].市场营销导刊,2006,0(5):39-42. 被引量：49
5常晋义,何春霞.基于三角不等式原理的K-means加速算法[J].计算机工程与设计,2007,28(21):5094-5096. 被引量：4
6M. J. Weinberger, J.J. Rissanen, R. B. Arps.Applications of universal context modeling to lossless compression of gray-scale images[J]. IEEE Transactions on Image Processing,1996,5(4) :575-586. 被引量：1
7D.Marpe, H. Schwarz, T. Wiegand,.Context-base dadaptive binary arithmetic coding in the H.264/AVC video compression standard[J].IEEE Transactions on Circuits and Systems for Video Technology,2003,13(7) 620-636. 被引量：1
8LF Lago-Fern ndez,F Corbacho. Normality-based validation for crisp clustering[J]. Pattern Recognition 2010, 43(3):782-795. 被引量：1
9S. Forchhammer, X. Wu, Context quantization by minimum adaptive code length, in: Proceedings of IEEE International Symposium on Information Theory,Nice, France, June2007,246-250. 被引量：1
10雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：114

引证文献4

1彭淑燕,刘思聪.基于改进K均值的Context量化模型[J].中国新通信,2016,18(6):14-16.
2刘宝龙,苏金.基于Hadoop平台的K-means聚类算法[J].计算机系统应用,2017,26(6):182-186. 被引量：2
3杨莉,沈鑫,李英娜,李萌萌.基于电力数据聚类分析的算法改进[J].云南电力技术,2017,45(6):64-68. 被引量：3
4陈顺雄,宋斌.浅析聚类算法研究在精准营销的应用[J].计算机与数字工程,2019,47(11):2939-2942. 被引量：1

二级引证文献6

1汪晶,邹学玉,喻维明,孙咏.分布式MVC-Kmeans算法设计与实现[J].长江大学学报（自然科学版）,2019,16(6):113-119. 被引量：3
2黄位华,范欣.基于用户特征信息智能分析的精准营销推送系统[J].现代电子技术,2021,44(6):43-46. 被引量：2
3胡波,邹洪,郭瑞鹏.基于电力网络监测数据的大数据安全分析平台关键技术研究[J].电力大数据,2021,24(3):51-58. 被引量：8
4于万国,隋丽娜.基于支持向量机的软件工程实验智能评价分析方法研究[J].现代电子技术,2021,44(22):183-186. 被引量：3
5龙玉江,钟掖,甘润东.基于区块链的电力数据管理架构研究[J].电力大数据,2021,24(8):19-24. 被引量：5
6黄冠杰.基于Canopy-Kmeans算法的电力企业流量数据分析研究[J].信息技术与网络安全,2022,41(1):18-22. 被引量：1

1谭勇,荣秋生.一个基于K-means的聚类算法的实现[J].湖北民族学院学报（自然科学版）,2004,22(1):69-71. 被引量：19
2冯俊文.综合评价与决策的相对有效性方法[J].系统工程与电子技术,1992,14(12):18-23. 被引量：1
3刘一鸣,张化祥.可变阈值的K-Means初始中心选择方法[J].计算机工程与应用,2011,47(32):56-58. 被引量：8
4黄鹏.试析提高初中数学小组合学习的方法[J].大东方,2015,0(9):158-158.
5傅学兰,费迎春.论小学低段语文教学中字形教学的有效性方法[J].教育界（教师培训）,2013(3):45-45.
6王佳,李醒飞,赵建远,马邺晨,于翔.参数辨识在陀螺仪性能分析中的应用[J].计算机应用研究,2013,30(3):817-819. 被引量：4
7王春光.试析职高计算机教学的有效性方法[J].才智,2012,0(27):236-236. 被引量：3
8苑一方,孙建平.基于电厂工况划分的双层聚类算法研究[J].电力科学与工程,2010,26(9):56-58. 被引量：2
9杜苗苗,杨灿美.一种维纳滤波图像复原算法的k值快速估计[J].微电子学与计算机,2015,32(8):45-47. 被引量：7
10孙雪,李昆仑,胡夕坤,赵瑞.基于半监督K-means的K值全局寻优算法[J].北京交通大学学报,2009,33(6):106-109. 被引量：11

计算机应用

2008年第12期

浏览历史

内容加载中请稍等...

基于新聚类有效性函数的改进K-means算法被引量：4

参考文献15

二级参考文献20

共引文献237

同被引文献24

引证文献4

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于新聚类有效性函数的改进K-means算法 被引量：4

参考文献15

二级参考文献20

共引文献237

同被引文献24

引证文献4

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于新聚类有效性函数的改进K-means算法被引量：4