基于k-Means均匀效应的健壮聚类初始算法被引量：2

A robust algorithm for cluster initialization using uniform effect of k-Means

导出

摘要为了提高噪声污染数据的聚类效果及质量,提出了一种基于k-Means均匀效应的健壮聚类初始化算法.k-Means聚类结果中各子簇样本量均匀一致,导致其中稀疏子簇范围大、稠密子簇范围小以及相邻稠密子簇范围相当等关系.算法利用超过实际聚类数的k-Means算法,以便获得上述子簇范围关系,通过合并邻近小子簇、丢弃稀疏的大子簇,自动获得相似样本簇并有效地消除噪声,从而实现健壮的聚类初始化.理论和实验证明了该算法的有效性. On the basis of k-Means clustering＇s uniform effect, a new robust clustering initialization algorithm is proposed to improve the clustering quality of an outlier-contaminated dataset. The uniform effect of k-Means can assure certain relationships between clusters that, clusters lying in any sparse sample all have big sizes, clusters lying in any dense area are all of small sizes, and neighbor clusters in dense area have comparable sizes. The algorithm first partition a dataset using k-Means method with an excessive cluster number, to easily obtain the above size relationships between clusters. Then, by merging those small-size clusters lying in the neighborhood, the algorithm obtains dense sample areas in the data space, which can be set as initial clusters. Outliers, however, distribute very sparsely, most of which are clustered into big-size clusters, and thus they affect the initialization process very little. Theoretic analysis and various experiments show the effectiveness of the proposed algorithm.

作者彭柳青张军英许进

机构地区西安电子科技大学计算机学院华中科技大学生命科学与技术学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2010年第8期73-76,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

基金国家自然科学基金资助项目(60933009)

关键词聚类初始化离群点 K-MEANS 凝聚合并健壮 clustering initialization outliers k-Means agglomerative merge robust

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Rehm F, Klawonn F, Kruse R. A novel approach to noise clustering for outlier detection[J]. Soft Computing, 2007, 11(5): 489 494. 被引量：1
2Dave R N, Krishnapuram R. Robust clustering models: a unified view[J]. IEEE Trans on Fuzzy Systems, 1997, 5(2): 270-293. 被引量：1
3Hadjahmadi A H, Homayounpour M M, Ahadi S M. Robust weighted fuzzy C-Means clustering [ C]//IEEE International Conference on Fuzzy Systems. Piscataway: IEEE Press, 2008: 305-311. 被引量：1
4Wang X, Qiu W, Zamar R H. An iterative non-parametric clustering algorithm based on local shrinking [J ]. Computational Statistics & Data Analysis, 2007, 52: 286-298. 被引量：1
5Daye R N. Characterization and detection of noise in clustering[J]. Pattern Recognition Letters, 1991, 12(11) : 657-664. 被引量：1
6Krishnapuram R, Keller J M. A possibilistic approach to clustering[J]. IEEE Trans Fuzzy Systems, 1993, 1(2).. 98-110. 被引量：1
7Hardin J, Rocke D M. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator [J]. Computational Statistics Data Analysis, 2004, 44: 625-638. 被引量：1
8He J, Lan J, Tan C L, et al. Initialization of cluster refinement algorithms., a review and comparative study[C]//International Joint Conference on Neural Networks. Piscataway: IEEE Press, 2004: 297-302. 被引量：1
9Wu J, Xiong H, Chen J, et al. A generalization of proximity functions for k-means[C] // Proceedings of the 2007 IEEE International Conference on Data Mining. Piscataway.. IEEE Press, 2007: 361-370. 被引量：1
10雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：113

二级参考文献8

1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250. 被引量：1
2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231. 被引量：1
3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114. 被引量：1
4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84. 被引量：1
5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60. 被引量：1
6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75. 被引量：1
7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562. 被引量：1
8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11. 被引量：1

共引文献112

1吕政阳,邓涛,张丽艳.一种基于机器视觉的飞机钣金件跨粒度识别方法[J].仪器仪表学报,2020,41(2):195-204. 被引量：10
2王海,高岭,陈东棋,任杰.一种基于用户行为的嵌入式功耗优化方法[J].系统仿真学报,2015,27(2):320-326.
3周慧芳.自适应的k-means聚类算法SA-K-means[J].科技创新导报,2009,6(34):4-5. 被引量：3
4罗晖霞,曲晓玲.基于网络舆情的K-Means算法的改进研究[J].电脑开发与应用,2010,23(8):4-6. 被引量：3
5李东艳,李绍滋,柯逍.基于外部数据库的图像自动标注改善模型[J].计算机应用,2010,30(10):2610-2613. 被引量：1
6刘琳,于海斌.异构无线传感器网络中簇首的优化部署策略[J].通信学报,2010,31(10):229-237. 被引量：7
7李晓燕,陈刚,寿黎但,董金祥.一种面向协作标签系统的图片检索聚类方法[J].中国图象图形学报,2010,15(11):1635-1643. 被引量：3
8雷小锋,何涛,李奎儒,谢昆青,丁世飞.面向结构稳定性的分裂-合并聚类算法[J].计算机科学,2010,37(11):217-222. 被引量：4
9黄美璇.一种基于Kmax的K-means改进算法[J].佛山科学技术学院学报（自然科学版）,2010,28(2):49-52. 被引量：1
10张宇,邵良杉.一种组合K近邻聚类在煤与瓦斯突出预测中的应用[J].辽宁工程技术大学学报（自然科学版）,2010,29(6):1039-1041. 被引量：6

同被引文献15

1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16
2刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量：37
3HAO Xiulan,HU Yunfa.Topic detection and track-ing oriented to BBS[C] ∥Computer,Mechatronics,Control and Electronic Engineering(CMCE),2010In-ternational Conference on.Piscataway,NJ,USA:IEEE,2010:154-157. 被引量：1
4KUMARAN G,ALLAN J.Text classification andnamed entities for new event detection[C] ∥The 27thAnnual International ACM SIGIR Conference.NewYork,USA:ACM,2004:297-304. 被引量：1
5MOHD M,CRESTANI F,RUTHVEN I.Design ofan interface for interactive topic detection and tracking[C] ∥Flexible Query Answering Systems 8th Interna-tional Conference on.Berlin,German:Springer,2009:227-238. 被引量：1
6HOUSHMAND M,NAGHIBZADEH M,ARABANS.Reliability-based similarity aggregation in ontologymatching[C] ∥Intelligent Computing and IntelligentSystems(ICIS),2010IEEE International Conferenceon.Piscataway,NJ,USA:IEEE,2010:744-749. 被引量：1
7ALLAN J,CARBONELL J,DODDINGTON G,etal.Topic detection and tracking pilot study final report[C] ∥Proceedings of the DARPA Broadcast NewsTranscription and Understanding Workshop.SanFrancisco,USA:Morgan Kaufmann Publishers,1998:194-218. 被引量：1
8NALLAPATI R,FENG A,PENG F C,et al.Eventthreading within news topics[C] ∥Proc of the 13thACM Conf on Information and Knowledge Manage-ment(CIKM).New York,USA:ACM,2004:446-453. 被引量：1
9张阔,李涓子,吴刚,王克宏.基于关键词元的话题内事件检测[J].计算机研究与发展,2009,46(2):245-252. 被引量：15
10曾依灵,许洪波,吴高巍,白硕.一种基于语料特性的聚类算法[J].软件学报,2010,21(11):2802-2813. 被引量：8

引证文献2

1杨攀,桂小林,田丰,王刚.一种高效的用于话题检测的关键词元聚类方法[J].西安交通大学学报,2012,46(10):24-28. 被引量：1
2陈国良,葛凯凯,李聪浩.基于多特征HMM融合的复杂动态手势识别[J].华中科技大学学报（自然科学版）,2018,46(12):42-47. 被引量：12

二级引证文献13

1郑学伟.基于语义的信息时序检测技术设计研究[J].电子测量技术,2016,39(10):42-45. 被引量：2
2李建明,刘宇,张婷.基于视知觉感知要素的能见度HMM模式测量方法研究[J].国外电子测量技术,2019,38(9):7-10. 被引量：2
3张慧子,陆心竹,刘佳丽,赵小敏,韩刚庆,王晗.基于手势与五官状态识别的航空多媒体控制系统[J].现代计算机,2019,25(34):36-40.
4付天豪,于力革.改进卷积神经网络的动态手势识别[J].计算机系统应用,2020,29(9):225-230. 被引量：2
5魏秋月,刘雨帆.基于Kinect和改进DTW算法的动态手势识别[J].传感器与微系统,2021,40(11):127-130. 被引量：12
6谷学静,周自朋,郭宇承,李晓刚.基于CNN-LSTM混合模型的动态手势识别方法[J].计算机应用与软件,2021,38(11):205-209. 被引量：12
7刘电霆,张晨光,黄康政,吴丹玲.人机交互手势的超声波检测及其HMM融合SVM识别算法[J].现代电子技术,2021,44(23):92-100. 被引量：5
8郑永权,张飞云,周帅.基于体感识别技术的运动训练辅助系统设计[J].电子设计工程,2021,29(24):93-97. 被引量：6
9孙彦玺,陈继斌,武东辉.基于卷积神经网络-双向长短期记忆网络的人体活动识别方法[J].科学技术与工程,2022,22(4):1517-1525. 被引量：10
10袁冠,邴睿,刘肖,代伟,张艳梅,蔡卓.基于时空图神经网络的手势识别[J].电子学报,2022,50(4):921-931. 被引量：8

1王树朋,王文祥,李宏伟.基于双字典集的信号稀疏分解算法[J].计算机应用,2012,32(9):2512-2515. 被引量：6
2李新良.数据挖掘中聚类初始化方法的优化研究[J].计算技术与自动化,2008,27(2):130-133. 被引量：1
3陈玉玲,田有亮.基于水平集的医学图像分割改进算法[J].遥感技术与应用,2015,30(3):527-533. 被引量：2
4秦飞,杨燕.寻找相似样本的小样本半监督学习[J].计算机工程与科学,2010,32(9):127-129.
5曹洁,余丽珍.基于MFCC和运动强度聚类初始化的多说话人识别[J].计算机应用研究,2012,29(9):3295-3298. 被引量：10
6王艳峰,张健,吴燕红.一种优选神经网络训练样本的混合聚类算法[J].辽宁工业大学学报（自然科学版）,2010,30(6):364-367. 被引量：2
7李广强,赵洪伦,赵凤强,滕弘飞.人机合作的免疫算法及其在布局设计中的应用[J].计算机工程,2005,31(21):4-6. 被引量：2
8刘小兰,郝志峰,汪国强,符克强.有时间窗的车辆路径问题的近似算法研究[J].计算机集成制造系统,2004,10(7):825-831. 被引量：20
9李苗苗,向凤红,刘新旺.一种新颖隶属度函数的模糊支持向量机[J].计算机工程与科学,2009,31(9):92-94. 被引量：7
10代劲,何中市,胡峰.基于云模型的连续属性决策表简化算法[J].南京大学学报（自然科学版）,2009,45(5):638-644. 被引量：5

华中科技大学学报（自然科学版）

2010年第8期

浏览历史

内容加载中请稍等...

基于k-Means均匀效应的健壮聚类初始算法被引量：2

参考文献13

二级参考文献8

共引文献112

同被引文献15

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于k-Means均匀效应的健壮聚类初始算法 被引量：2

参考文献13

二级参考文献8

共引文献112

同被引文献15

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

基于k-Means均匀效应的健壮聚类初始算法被引量：2