In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very importan...In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views,while fusing these data. Multi-view Clustering(MvC) has attracted increasing attention in recent years by aiming to exploit complementary and consensus information across multiple views. This paper summarizes a large number of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multiview graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few examples for how these techniques are used. In addition, it lists some publically available multi-view datasets.Overall, this paper serves as an introductory text and survey for multi-view clustering.展开更多
多视角数据的涌现对传统单视角聚类算法提出了挑战.利用单视角聚类算法独立地对每个视角进行划分,再通过集成机制获取全局划分的方法,人为地割裂了视角之间的内在联系,难以获得理想的聚类效果.针对此问题,提出了一个多视角聚类模型.该...多视角数据的涌现对传统单视角聚类算法提出了挑战.利用单视角聚类算法独立地对每个视角进行划分,再通过集成机制获取全局划分的方法,人为地割裂了视角之间的内在联系,难以获得理想的聚类效果.针对此问题,提出了一个多视角聚类模型.该模型不仅考虑了视角内的划分质量,还兼顾了视角间的协同学习机制.对于视角内的划分,为了捕捉更为准确的簇内结构信息,采用多代表点的簇结构表示策略;对于视角间的协同学习机制,假设簇中代表点在不同视角下,其代表性保持.因此,在该模型基础上提出了基于代表点一致性约束的多视角模糊聚类算法(multi-view fuzzy clustering with a medoid invariant constraint,简称MFCMddI).该算法通过最大化两两相邻视角下代表点权重系数的乘积之和来保证代表点一致性.MFCMddI的目标函数可通过引入拉格朗日乘子和KKT条件进行优化.在人工数据集以及真实数据集上的实验结果均表明,该算法相对于所引入的对比算法而言具有一定的优势.展开更多
在K-means型多视图聚类算法中,最终的聚类结果会受到初始类中心的影响。因此研究了不同的初始中心选择方法对K-means型多视图聚类算法的影响,并提出一种基于采样的主动式初始中心选择方法(sampledclustering by fast search and find of...在K-means型多视图聚类算法中,最终的聚类结果会受到初始类中心的影响。因此研究了不同的初始中心选择方法对K-means型多视图聚类算法的影响,并提出一种基于采样的主动式初始中心选择方法(sampledclustering by fast search and find of density peaks,SDPC)。该方法通过对数据集进行均匀采样,利用密度峰值快速搜索聚类算法(clustering by fast search and find of density peaks,DPC),以及K-means再迭代策略,进一步改善多视图聚类中的初始中心选择效率和类个数问题。实验验证了不同初始化方法对K-means型多视图聚类算法的影响。多视图基准数据集上的实验结果表明:全局(核)K-means初始化方法存在时间复杂度过高的问题,AFKMC^2(assumption-free K-Markov chain Monte Carlo)初始化适用于大规模数据,DPC可以主动选择类个数和初始类中心,SDPC较DPC而言,不仅能主动式获得类个数,还在聚类精度和效率上取得了较好的折衷。展开更多
大规模多视图聚类旨在解决传统多视图聚类算法中计算速度慢、空间复杂度高,以致无法扩展到大规模数据的问题.其中,基于锚点的多视图聚类方法通过使用整体数据集合的锚点集构建后者对于前者的重构矩阵,利用重构矩阵进行聚类,有效地降低...大规模多视图聚类旨在解决传统多视图聚类算法中计算速度慢、空间复杂度高,以致无法扩展到大规模数据的问题.其中,基于锚点的多视图聚类方法通过使用整体数据集合的锚点集构建后者对于前者的重构矩阵,利用重构矩阵进行聚类,有效地降低了算法的时间和空间复杂度.然而,现有的方法忽视了锚点之间的差异,均等地看待所有锚点,导致聚类结果受到低质量锚点的限制.为定位更具有判别性的锚点,加强高质量锚点对聚类的影响,提出一种基于加权锚点的大规模多视图聚类算法(Multi-view clustering with weighted anchors,MVC-WA).通过引入自适应锚点加权机制,所提方法在统一框架下确定锚点的权重,进行锚图的构建.同时,为增加锚点的多样性,根据锚点之间的相似度进一步调整锚点的权重.在9个基准数据集上与现有最先进的大规模多视图聚类算法的对比实验结果验证了所提方法的高效性与有效性.展开更多
基金supported in part by the National Natural Science Foundation of China (No. 61572407)
文摘In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining and analysis. This calls for advanced techniques that consider the diversity of different views,while fusing these data. Multi-view Clustering(MvC) has attracted increasing attention in recent years by aiming to exploit complementary and consensus information across multiple views. This paper summarizes a large number of multi-view clustering algorithms, provides a taxonomy according to the mechanisms and principles involved, and classifies these algorithms into five categories, namely, co-training style algorithms, multi-kernel learning, multiview graph clustering, multi-view subspace clustering, and multi-task multi-view clustering. Therein, multi-view graph clustering is further categorized as graph-based, network-based, and spectral-based methods. Multi-view subspace clustering is further divided into subspace learning-based, and non-negative matrix factorization-based methods. This paper does not only introduce the mechanisms for each category of methods, but also gives a few examples for how these techniques are used. In addition, it lists some publically available multi-view datasets.Overall, this paper serves as an introductory text and survey for multi-view clustering.
文摘多视角数据的涌现对传统单视角聚类算法提出了挑战.利用单视角聚类算法独立地对每个视角进行划分,再通过集成机制获取全局划分的方法,人为地割裂了视角之间的内在联系,难以获得理想的聚类效果.针对此问题,提出了一个多视角聚类模型.该模型不仅考虑了视角内的划分质量,还兼顾了视角间的协同学习机制.对于视角内的划分,为了捕捉更为准确的簇内结构信息,采用多代表点的簇结构表示策略;对于视角间的协同学习机制,假设簇中代表点在不同视角下,其代表性保持.因此,在该模型基础上提出了基于代表点一致性约束的多视角模糊聚类算法(multi-view fuzzy clustering with a medoid invariant constraint,简称MFCMddI).该算法通过最大化两两相邻视角下代表点权重系数的乘积之和来保证代表点一致性.MFCMddI的目标函数可通过引入拉格朗日乘子和KKT条件进行优化.在人工数据集以及真实数据集上的实验结果均表明,该算法相对于所引入的对比算法而言具有一定的优势.
文摘在K-means型多视图聚类算法中,最终的聚类结果会受到初始类中心的影响。因此研究了不同的初始中心选择方法对K-means型多视图聚类算法的影响,并提出一种基于采样的主动式初始中心选择方法(sampledclustering by fast search and find of density peaks,SDPC)。该方法通过对数据集进行均匀采样,利用密度峰值快速搜索聚类算法(clustering by fast search and find of density peaks,DPC),以及K-means再迭代策略,进一步改善多视图聚类中的初始中心选择效率和类个数问题。实验验证了不同初始化方法对K-means型多视图聚类算法的影响。多视图基准数据集上的实验结果表明:全局(核)K-means初始化方法存在时间复杂度过高的问题,AFKMC^2(assumption-free K-Markov chain Monte Carlo)初始化适用于大规模数据,DPC可以主动选择类个数和初始类中心,SDPC较DPC而言,不仅能主动式获得类个数,还在聚类精度和效率上取得了较好的折衷。
文摘大规模多视图聚类旨在解决传统多视图聚类算法中计算速度慢、空间复杂度高,以致无法扩展到大规模数据的问题.其中,基于锚点的多视图聚类方法通过使用整体数据集合的锚点集构建后者对于前者的重构矩阵,利用重构矩阵进行聚类,有效地降低了算法的时间和空间复杂度.然而,现有的方法忽视了锚点之间的差异,均等地看待所有锚点,导致聚类结果受到低质量锚点的限制.为定位更具有判别性的锚点,加强高质量锚点对聚类的影响,提出一种基于加权锚点的大规模多视图聚类算法(Multi-view clustering with weighted anchors,MVC-WA).通过引入自适应锚点加权机制,所提方法在统一框架下确定锚点的权重,进行锚图的构建.同时,为增加锚点的多样性,根据锚点之间的相似度进一步调整锚点的权重.在9个基准数据集上与现有最先进的大规模多视图聚类算法的对比实验结果验证了所提方法的高效性与有效性.