摘要
异构信息网络聚类问题是一个新兴问题.最近提出的排名聚类算法将之前看似完全无关的排名与演化聚类方法结合在一起,相互加强,为异构网络的挖掘提供了一种新思路.然而排名聚类算法仅仅完成异构信息网络中特定目标类型数据的聚类,其聚类结果无法涵盖完整的异构网络结构和异构类型数据.引入协同聚类方法,将排名与协同聚类相结合,本文提出一种Rank Co Clus算法,首先由基于后验概率的排名分布生成模型得到排名分布矩阵,然后使用协同聚类方法对不同类型的对象同时聚类,一方面可以实现异构信息网络中不同类型节点的同时聚类,另一方面也能提升异构类型数据聚簇结果的一致性关联.真实DBLP四领域数据集及人造数据集上的对照实验结果表明,Rank Co Clus算法在准确性和聚簇一致性等方面较排名聚类及协同聚类算法均有更好的性能.
Clustering analysis of heterogeneous information network is an emerging problem. Recently proposed RankClus algorithm integrates traditional clustering with ranking, which used to be regarded as two orthogonal techniques, providing a new idea for heter- ogeneous information network analysis. However, RankClus gives the clustering results of specified target type only, which can cover neither the complete structure of heterogeneous network nor the whole data of multi-type. By introducing co-clustering technique and combining it with ranking, we propose a novel clustering algorithm called RankCoClus. Firstly, the ranking distribution matrix is gen- erated by the ranking distribution generation model based on the posterior probability, and then the co-clustering methods are used for clustering of objects of different types synchronously. Through this process, we not only cluster different types of nodes simultaneous- ly, but also improve the consistency of clustering results. Experimental results employing the real DBLP-4 area data set and synthetic data sets illustrate that the proposed algorithm can achieve better performance compared to RankClus and the classic Co-clustering al- gorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第11期2445-2449,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60805042)资助
福建省自然科学基金项目(2010J01329
2011J05150
2012J01262
2013J01231)资助
福建省重大产学合作项目(2011H6014)资助
关键词
聚类
排名分布
协同
异构信息网络
clustering
ranking distribution
collaboration
heterogeneous information network