期刊文献+

基于k-Means均匀效应的健壮聚类初始算法 被引量:2

A robust algorithm for cluster initialization using uniform effect of k-Means
原文传递
导出
摘要 为了提高噪声污染数据的聚类效果及质量,提出了一种基于k-Means均匀效应的健壮聚类初始化算法.k-Means聚类结果中各子簇样本量均匀一致,导致其中稀疏子簇范围大、稠密子簇范围小以及相邻稠密子簇范围相当等关系.算法利用超过实际聚类数的k-Means算法,以便获得上述子簇范围关系,通过合并邻近小子簇、丢弃稀疏的大子簇,自动获得相似样本簇并有效地消除噪声,从而实现健壮的聚类初始化.理论和实验证明了该算法的有效性. On the basis of k-Means clustering's uniform effect, a new robust clustering initialization algorithm is proposed to improve the clustering quality of an outlier-contaminated dataset. The uniform effect of k-Means can assure certain relationships between clusters that, clusters lying in any sparse sample all have big sizes, clusters lying in any dense area are all of small sizes, and neighbor clusters in dense area have comparable sizes. The algorithm first partition a dataset using k-Means method with an excessive cluster number, to easily obtain the above size relationships between clusters. Then, by merging those small-size clusters lying in the neighborhood, the algorithm obtains dense sample areas in the data space, which can be set as initial clusters. Outliers, however, distribute very sparsely, most of which are clustered into big-size clusters, and thus they affect the initialization process very little. Theoretic analysis and various experiments show the effectiveness of the proposed algorithm.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第8期73-76,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(60933009)
关键词 聚类 初始化 离群点 K-MEANS 凝聚合并 健壮 clustering initialization outliers k-Means agglomerative merge robust
  • 相关文献

参考文献13

  • 1Rehm F, Klawonn F, Kruse R. A novel approach to noise clustering for outlier detection[J]. Soft Computing, 2007, 11(5): 489 494. 被引量:1
  • 2Dave R N, Krishnapuram R. Robust clustering models: a unified view[J]. IEEE Trans on Fuzzy Systems, 1997, 5(2): 270-293. 被引量:1
  • 3Hadjahmadi A H, Homayounpour M M, Ahadi S M. Robust weighted fuzzy C-Means clustering [ C]//IEEE International Conference on Fuzzy Systems. Piscataway: IEEE Press, 2008: 305-311. 被引量:1
  • 4Wang X, Qiu W, Zamar R H. An iterative non-parametric clustering algorithm based on local shrinking [J ]. Computational Statistics & Data Analysis, 2007, 52: 286-298. 被引量:1
  • 5Daye R N. Characterization and detection of noise in clustering[J]. Pattern Recognition Letters, 1991, 12(11) : 657-664. 被引量:1
  • 6Krishnapuram R, Keller J M. A possibilistic approach to clustering[J]. IEEE Trans Fuzzy Systems, 1993, 1(2).. 98-110. 被引量:1
  • 7Hardin J, Rocke D M. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator [J]. Computational Statistics Data Analysis, 2004, 44: 625-638. 被引量:1
  • 8He J, Lan J, Tan C L, et al. Initialization of cluster refinement algorithms., a review and comparative study[C]//International Joint Conference on Neural Networks. Piscataway: IEEE Press, 2004: 297-302. 被引量:1
  • 9Wu J, Xiong H, Chen J, et al. A generalization of proximity functions for k-means[C] // Proceedings of the 2007 IEEE International Conference on Data Mining. Piscataway.. IEEE Press, 2007: 361-370. 被引量:1
  • 10雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量:113

二级参考文献8

  • 1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250. 被引量:1
  • 2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231. 被引量:1
  • 3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114. 被引量:1
  • 4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84. 被引量:1
  • 5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60. 被引量:1
  • 6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75. 被引量:1
  • 7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562. 被引量:1
  • 8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11. 被引量:1

共引文献112

同被引文献15

  • 1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量:16
  • 2刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 3HAO Xiulan,HU Yunfa.Topic detection and track-ing oriented to BBS[C] ∥Computer,Mechatronics,Control and Electronic Engineering(CMCE),2010In-ternational Conference on.Piscataway,NJ,USA:IEEE,2010:154-157. 被引量:1
  • 4KUMARAN G,ALLAN J.Text classification andnamed entities for new event detection[C] ∥The 27thAnnual International ACM SIGIR Conference.NewYork,USA:ACM,2004:297-304. 被引量:1
  • 5MOHD M,CRESTANI F,RUTHVEN I.Design ofan interface for interactive topic detection and tracking[C] ∥Flexible Query Answering Systems 8th Interna-tional Conference on.Berlin,German:Springer,2009:227-238. 被引量:1
  • 6HOUSHMAND M,NAGHIBZADEH M,ARABANS.Reliability-based similarity aggregation in ontologymatching[C] ∥Intelligent Computing and IntelligentSystems(ICIS),2010IEEE International Conferenceon.Piscataway,NJ,USA:IEEE,2010:744-749. 被引量:1
  • 7ALLAN J,CARBONELL J,DODDINGTON G,etal.Topic detection and tracking pilot study final report[C] ∥Proceedings of the DARPA Broadcast NewsTranscription and Understanding Workshop.SanFrancisco,USA:Morgan Kaufmann Publishers,1998:194-218. 被引量:1
  • 8NALLAPATI R,FENG A,PENG F C,et al.Eventthreading within news topics[C] ∥Proc of the 13thACM Conf on Information and Knowledge Manage-ment(CIKM).New York,USA:ACM,2004:446-453. 被引量:1
  • 9张阔,李涓子,吴刚,王克宏.基于关键词元的话题内事件检测[J].计算机研究与发展,2009,46(2):245-252. 被引量:15
  • 10曾依灵,许洪波,吴高巍,白硕.一种基于语料特性的聚类算法[J].软件学报,2010,21(11):2802-2813. 被引量:8

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部