期刊文献+

基于集成学习的改进深度嵌入聚类算法 被引量:3

Improved Deep Embedding Clustering with Ensemble Learning
下载PDF
导出
摘要 近年来深度学习的迅速发展为聚类研究提供了一个有力的工具,并衍生出了许多基于深度神经网络的聚类方法。在这些方法中,深度嵌入聚类(DEC)因其可对深度表示学习和聚类分配同时进行优化的优势而日益受到关注。但是,深度嵌入聚类的一个局限性在于其超参数λ的敏感性,而往往需要诉诸人工调节来解决。对此,提出一种基于集成学习的改进深度嵌入聚类(IDECEL)方法。相较于寻求单个最优超参数的常规做法,提出以多样化超参数λ构建一组具有差异性的基聚类,并结合熵理论对基聚类集合的簇不确定性进行评估与加权,进而在簇与样本之间构建一个局部加权二部图模型,再将之高效划分以得到一个更优聚类结果。在多个数据集上的实验结果表明,提出的IDECEL方法不仅可缓解常规DEC算法超参数敏感性的问题,同时也表现出比其他多个深度聚类和集成聚类方法更为鲁棒的聚类性能。 Recently the rapid development of the deep learning technique has provided a powerful tool for the clustering research,and has given rise to quite a number of deep neural network-based clustering methods.Among these methods,deep embedding clustering(DEC)has been drawing increasing attention,due to its advantage in performing deep representation learning and optimizing clustering assignment simultaneously.However,one limitation to DEC lies in its sensitivity to the hyper-parameterλ,which often requires manual fine-tuning.To address this problem,this paper presents an improved deep embedding clustering method with ensemble learning(IDECEL).Instead of searching for a single optimal hyper-parameter,this paper makes use of a set of diversified hyper-parametersλto construct an ensemble of diversified base clusterings.By exploiting the concept of entropy,this paper evaluates the uncertainty of the clusters in these base clusterings and weights them accordingly.Further,this paper constructs a locally weighted bipartite graph between base clusters and data samples,and efficiently partitions it to obtain a better clustering result.Experimental results on multiple datasets show that the proposed IDECEL method not only alleviates the hyper-parameter sensitivity problem in DEC,but also exhibits more robust clustering performance than several other deep clustering and ensemble clustering methods.
作者 黄宇翔 黄栋 王昌栋 赖剑煌 HUANG Yuxiang;HUANG Dong;WANG Changdong;LAI Jianhuang(College of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China;Guangzhou Key Laboratory of Intelligent Agriculture,Guangzhou 510642,China;School of Computer Science and Engineering,Sun Yat-Sen University,Guangzhou 510006,China)
出处 《计算机科学与探索》 CSCD 北大核心 2021年第10期1949-1957,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(61976097,61876193,61876104) 广东省自然科学基金(2021A1515012203) 广州市智慧农业重点实验室项目(201902010081)。
关键词 数据聚类 深度聚类 集成聚类 集成学习 敏感超参数 data clustering deep clustering ensemble clustering ensemble learning sensitive hyper-parameters
  • 相关文献

参考文献2

二级参考文献21

  • 1乔珠峰,田凤占,黄厚宽,陈景年.缺失数据处理方法的比较研究[J].计算机研究与发展,2006,43(z1):171-175. 被引量:13
  • 2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1079
  • 3Han Jiawei, Kamber M, Pei Jian. Data Mining Concepts and Techniques [M]. 3rd ed. San Francisco, CA Morgan Kaufmann, 2011. 被引量:1
  • 4Xo Rui, Wunsch D. Survey of clustering algorithm [J]. IEEE Trans on Neural Networks, 2005, 16(3) 645-678. 被引量:1
  • 5Strehl A, Ghosh J. Cluster ensembles.. A knowledge reuse {ramework {or combining multiple partitions [J]. Journal of Machine Learning Research, 2002, 3: 583-617. 被引量:1
  • 6Fred A L, Jaln A K. mbining multiple elusterings using evidence accumulation [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(6) 8315-850. 被引量:1
  • 7lain-On N, ongoen T. Comparative study of matrix refinement approaches for ensemble clustering [J], Machine Learning, 2015,. 98(1.[2) 69-300. 被引量:1
  • 8Ghosh J, Acharya A. Cluster ensembles[J]. Wiley InterdisCiplinary Reviews: Data Mining and Knowledge Discovery, 2011, 1(4): 305-315. 被引量:1
  • 9He Zengyou, Xu xiao][ei, Deng Shengchun. Clustering mixed numeric and categorical data: A duster ensemble approach [OL]. ArXiv es/050901t, 2005:1-14 [2015-09-08]. http:// arxiv, org/ahs/cs[050901]. 被引量:1
  • 10Shaqsi J, Wang Wenjia. A clustering ensemble method for clustering mixed data [C] //Proe of the Int Joint Conf on Neural Networks. Piseataway, N J: IEEE, 2010 1-8. 被引量:1

共引文献25

同被引文献27

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部