摘要
为解决传统聚类算法在处理大规模信息网络中时间开销过大的问题,基于大规模信息网络的统计学特性,提出了一种将信息网络拓扑结构进行"分而治之"的思想,有效地减少了聚类问题规模和时间开销,并保持了相当的聚类效果。主要贡献包括:提出按照聚类影响力排名来对整个信息网络进行分层切割,然后分别聚类的思想;按照特定信息网络统计学意义上的结构特性,如信息网络的富人集团特性和分层社区结构特性,设计了一套将信息网络进行层次划分的粗略方案,并通过实验证明了其具有一定的合理性;提出了迭代的层级间聚类融合算法,可以实现不同层次聚类的融合。实验表明,该算法在兼具较好聚类效果的同时,非常明显地减少了运算开销。
The time cost of traditional clustering algorithm is too high when using it to large scale information net-work. To solve this issue, based on the statistical characteristic of information network, this paper proposes a novel“divide and conquer”strategy on information network, which reduces the clustering size and time cost heavily without efficiency loss. The main contribution of this paper is three folds:(1) It proposes the idea that clustering in different layers separately after dividing the whole information network into several layers according to the clustering contribution rank;(2) Based on the rich-club phenomenon and hierarchical community feature which exists in information network, it designs the blueprint of layer dividing method of clustering algorithm;(3) It presents an iteration procedure to merge clusters in different layers. The experimental results show that the proposed algorithm has good clustering effect and can reduce time cost.
出处
《计算机科学与探索》
CSCD
2014年第4期406-416,共11页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Grant No.61103043
国家"十二五"科技支撑计划项目Grant No.2012BAG04B02
武汉大学软件工程国家重点实验室开放基金项目Grant No.SKLSE2012-09-26~~