期刊文献+

基于孤立点和初始质心选择的k均值算法的改进与应用 被引量:4

Application of an improved k-means algorithm based on outliers and original clustering center
下载PDF
导出
摘要 针对聚类中广泛应用的经典k均值算法随机选择初始质心和易受孤立点影响的不足,给出了二次改进的k均值算法。首先使用距离法移除孤立点,然后采用邻近吸收法对初始聚类中心的选择进行改进,并做了改进前后的对比实验。结果表明,改进后的算法比较稳定、准确,受孤立点和随机选择质心的影响也有所降低。 The classic algorithm of in clustering, including both strong points k-means is discussed, which is one of the most widespread methods and weak points. Not only is it sensitive to the original clustering center,but also it may be affected by the outliers. Given these shortages, an improved algorithm is discussed, which makes improvements in outliers and selection of original clustering center. The outlier detection based on the distance method. To select original clustering center based on the nearest neighbour is assimilated. Checking experiment has been done, which indicates the improved one is more stable, more accurate and the affection by the oufliers is down to a much low figure.
出处 《陕西理工学院学报(自然科学版)》 2009年第3期45-49,共5页 Journal of Shananxi University of Technology:Natural Science Edition
基金 黑龙江省教育厅科学技术研究项目(No.11521008) 黑龙江省自然科学基金资助项目(No.F200603)
关键词 K均值算法 孤立点 初始质心 距离 algorithm of k-means outliers original clustering center distance
  • 相关文献

参考文献8

  • 1杨小兵..聚类分析中若干关键技术的研究[D].浙江大学,2005:
  • 2连凤娜,吴锦林,唐琦.一种改进的K-means聚类算法[J].电脑与信息技术,2008,16(1):38-40. 被引量:23
  • 3Marques J P.模式识别-原理、方法及应用[M].吴逸飞译.第2版.北京:清华大学出版社.2002.51-74. 被引量:1
  • 4Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining[A]. Prec. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery[C]. Tucson, 1997. 146-151. 被引量:1
  • 5Sambasivam S,Theodosopoulos N. Advanced data clustering methods of mining Web documents[J]. Issues in Informing Science and Information Technology,2006, (3) :563-579. 被引量:1
  • 6Sanjay Chawla,Pei Sun. SLOM: a new measure for local spatial outliers[J] . Knowledge and Information Systems, 2006, (4) :412-429. 被引量:1
  • 7尹珧人,王德广.一种改进的k-means聚类算法在入侵检测中的应用[J].科学技术与工程,2008,8(16):4701-4705. 被引量:7
  • 8Sudipto G, Rajeev R, Kyuseok S. Cure:an effieient Elustering algorithm forlarge databases [ J ]. InformationSystems ,2001, 261:35-58. 被引量:1

二级参考文献7

  • 1陆声链,林士敏.基于距离的孤立点检测研究[J].计算机工程与应用,2004,40(33):73-75. 被引量:44
  • 2袁方,孟增辉,于戈.对k-means聚类算法的改进[J].计算机工程与应用,2004,40(36):177-178. 被引量:47
  • 3[1]Agrawalr S.Database mining:a performance perspective.IEEE Transctions on Knowledge and Data Engineering,1993:5(6):914-925 被引量:1
  • 4Han J W Kamber M 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版杜,2001.147-158. 被引量:113
  • 5Kaufan L, Rousseeuw Pj. Finding Groups in Data: an Introduction to Cluster Analysis[M]. New York: John Wiley & Sons, 1990. 被引量:1
  • 6Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databased[C]. In Haas LM, Tiwary A eds. Proceedings of the ACM SIGMOD International Conference on Management of Data, Sesttle: ACM Press, 1998:73-84. 被引量:1
  • 7张玉芳,毛嘉莉,熊忠阳.一种改进的K-means算法[J].计算机应用,2003,23(8):31-33. 被引量:72

共引文献26

同被引文献23

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部