基于类间差异最大化的加权距离改进K-means算法被引量：2

An improved K-means algorithm by weighted distance based on the maximum between-cluster variation

导出

摘要为了改善K-means算法的聚类效果,将聚类准则函数定义为加权的类内误差平方总和SSE(sum of thesquared error),并调整了K-means算法迭代过程中重新分配数据对象的方法:使用一个带有类内数据对象数的加权距离作为重新分配数据对象的依据,同时按类间差异最大化为准则优化了加权距离中的参数。实验表明,改进后的K-means算法可以在很大程度上减少大类被拆分情况的发生,明显改善聚类效果。 To find natural clusters,the criterion function was improved by being defined as the weighted sum of the squared error.The way each point assigned to the centroid in the iteration of the K-means algorithm was also modified： each point was assigned to the centroid that had minimum weighted distance.The weight was related to the number of points in each cluster,and the parameter of weighted distance was optimized by maximizing the between-cluster variation.Experimental results showed that the improved K-means algorithm significantly enhanced the clustering quality by reducing the probability of larger cluster＇s being broken.

作者张雪凤刘鹏

机构地区上海财经大学信息管理与工程学院上海财经大学继续教育学院

出处《山东大学学报（理学版）》 CAS CSCD 北大核心 2010年第7期28-33,共6页 Journal of Shandong University(Natural Science)

基金上海财经大学‘211工程’三期重点学科建设项目

关键词 K-MEANS算法聚类类间差异加权距离 K-means algorithm clustering between-cluster variation weighted distance

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献14

1SOMAN K P, DIWAKAR S, AJAY V. Insight into data mining: theory and practice[M]. India: Prentice Hall of India, 2006:209. 被引量：1
2HAN J, KAMBER M. Data mining: concepts and techniques[M]. San Diego: Morgan Kaufmann Publishers, 2001:223. 被引量：1
3MACQUEEN J B. Some methods for classification and analysis of multivariate observations [ C ]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, 1 : 281-297. 被引量：1
4PELLEG D, MOORE A. X-means: extending K-means with efficient estimation of the number of clusters [ C ]//Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. , 2000:727-734. 被引量：1
5BRADLEY P S, FAYYAD U M. Refining initial points for K-means clustering [ C ]//Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1998:91-99. 被引量：1
6KHAN S S, AHMAD A. Cluster center initialization for K-means clustering [J].Pattern Recognition Letters, 2004, 25( 11 ) : 1293-1302. 被引量：1
7DEELERS S, AUWATANAMONGKOL S. Enhancing K- means algorithm with Initial cluster centers derived from data partitioning along the data axis with the highest variance [ J ] International Journal of Computer Science, 2007, 2(4) :247-252. 被引量：1
8PENA J M, LOZANO J A, LARRANAGA P. An empirical comparison of four initialization methods for the K- means algorithm [J]. Pattern Recognition Letters, 1999, 50 : 1027-1040. 被引量：1
9LIKAS A, VLASSIS N, VERBEEK J J. The global K- means clustering algorithm [J]. Pattern Recognition,2003, 36(2) : 451-461. 被引量：1
10BRADLEY P S, FAYYAD U, REINA C. Scaling clustering algorithms to large databases[C ]//Proceedings of the 4th ACM SIGKDD. New York:ACM Press, 1998: 9-15. 被引量：1

同被引文献12

1王慧锋,战桂礼,罗晓明.基于数学形态学的边缘检测算法研究及应用[J].计算机工程与应用,2009,45(9):223-226. 被引量：98
2胡敏,宋银龙.基于二维Otsu和模糊聚类的图像分割算法[J].计算机应用研究,2012,29(4):1563-1565. 被引量：7
3邢明明,董世民,崔阳,陈培毅.游梁式抽油机皮带传动效率的仿真模型[J].工程力学,2013,30(7):242-247. 被引量：6
4陈晓贺,陈俊聪,袁舟.抽油机井平衡比与能耗的关系[J].油气田地面工程,2014,33(1):5-6. 被引量：8
5王世良.抽油机减速箱常见故障原因及分析[J].石油和化工设备,2014,17(4):58-59. 被引量：1
6田连雨.抽油机井日常管理能耗节点标准的确定[J].石油工业技术监督,2014,30(6):28-31. 被引量：8
7徐晓刚,于金辉,马利庄.复杂物体轮廓提取[J].中国图象图形学报（A辑）,2001,6(5):455-459. 被引量：11
8王海,余佳,倪拥军,宋百强.沁水盆地南部煤层气井抽油机偏磨防治分析[J].中国煤层气,2014,11(6):41-43. 被引量：4
9宁倩.抽油机机型优化匹配互调工艺设计[J].石油石化节能,2015,5(10):1-4. 被引量：1
10刘世奇,赵贤正,桑树勋,杨延辉,李梦溪,胡秋嘉,杨艳磊.煤层气井排采液面-套压协同管控——以沁水盆地樊庄区块为例[J].石油学报,2015,36(B11):97-108. 被引量：31

引证文献2

1梅永贵,骆裕明,王景悦,薛占新,石延霞,张斌.基于聚类分析的煤层气抽油机井能耗特征研究[J].中国煤层气,2020,17(4):8-12. 被引量：4
2易宗锐,朱敏,杨寸月,任申元,刘昊霖.基于差异最大化加权聚类的免疫组化图像处理方法[J].四川大学学报（工程科学版）,2013,45(S2):150-154.

二级引证文献4

1孔令维,许立红,刘树辉.基于大数据分析的机采井组最优产能模型的确定与应用[J].石油石化节能,2023,13(5):49-53. 被引量：2
2徐晓冰.抽油机节能技术应用和常见问题[J].化学工程与装备,2023(7):139-140.
3白生勇.基于大数据的抽油机井硬件匹配技术研究[J].石油石化节能与计量,2024,14(5):21-24.
4姜凯馨.基于大数据的抽油机井拖动设备匹配模型的确定与应用[J].石油石化节能与计量,2024,14(7):6-10.

1姚小群,姚锡凡,陈统坚,刘志良,邹伟全.误差平方与信息熵指标在加工过程模糊控制中的性能比较[J].工具技术,2008,42(4):31-33.
2梁小强,恰汗.合孜尔,冯波.聚类算法的若干问题研究[J].软件导刊,2008,7(11):63-64.
3胡艳维,秦拯,张忠志.基于模拟退火与K均值聚类的入侵检测算法[J].计算机科学,2010,37(6):122-124. 被引量：35
4张雪凤,张桂珍,刘鹏.基于聚类准则函数的改进K-means算法[J].计算机工程与应用,2011,47(11):123-127. 被引量：41
5杨国亮,王志良,任金霞,李钟侠.一种基于遗传操作的聚类算法[J].计算机应用,2003,23(z2):199-201.
6张桂林,张天序,魏洛刚,谢先明.基于边缘特征的运动目标提取与跟踪[J].华中理工大学学报,1994,22(5):42-45. 被引量：12
7程慈,柴瑞敏.聚类数的自动确定[J].科技信息,2008(14):143-143.
8董跃华,马亚飞.基于模糊聚类的构件检索方法[J].南昌大学学报（理科版）,2012,36(4):398-401.
9王江涛,石红岩,练煜.基于Relieft特征加权的C均值聚类算法的研究和改进[J].合肥学院学报（自然科学版）,2015,25(2):29-34.
10周大伟,叶清.基于互冲突量和自冲突量分析的证据聚类方法[J].火力与指挥控制,2011,36(6):39-41. 被引量：1

山东大学学报（理学版）

2010年第7期

浏览历史

内容加载中请稍等...

基于类间差异最大化的加权距离改进K-means算法被引量：2

参考文献14

同被引文献12

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于类间差异最大化的加权距离改进K-means算法 被引量：2

参考文献14

同被引文献12

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于类间差异最大化的加权距离改进K-means算法被引量：2