提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对...提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对模块度增量建立索引,基于模块度最优的思想设计一种新的重叠社区检测算法.相对于传统的重叠节点检测算法,对每个节点分析的频率大为降低,可以在较低的算法运行时间下获得较高的识别准确率.复杂网络大数据集上的算法测试结果表明:DOC算法能够有效地检测出网络重叠社区,社区识别准确率较高,在大规模LFR基准数据集上其重叠社区检测标准化互信息指标NMI最高能达到0.97,重叠节点检测指标F-score的平均值在0.91以上,且复杂网络大数据下的运行时间明显优于传统算法.展开更多
In this paper, we propose a balanced multi-label propagation algorithm (BMLPA) for overlapping community detection in social networks. As well as its fast speed, another important advantage of our method is good sta...In this paper, we propose a balanced multi-label propagation algorithm (BMLPA) for overlapping community detection in social networks. As well as its fast speed, another important advantage of our method is good stability, which other multi-label propagation algorithms, such as COPRA, lack. In BMLPA, we propose a new update strategy, which requires that community identifiers of one vertex should have balanced belonging coefficients. The advantage of this strategy is that it allows vertices to belong to any number of communities without a global limit on the largest number of community memberships, which is needed for COPRA. Also, we propose a fast method to generate "rough cores", which can be used to initialize labels for multi-label propagation algorithms, and are able to improve the quality and stability of results. Experimental results on synthetic and real social networks show that BMLPA is very efficient and effective for uncovering overlapping communities.展开更多
重叠社区发现是近年来复杂网络领域的研究热点之一.提出一种半监督的局部扩展式重叠社区发现方法SLEM(semi-supervised local expansion method).该方法借鉴了带约束的半监督聚类的思想,不仅利用网络的拓扑结构信息,还充分地利用网络节...重叠社区发现是近年来复杂网络领域的研究热点之一.提出一种半监督的局部扩展式重叠社区发现方法SLEM(semi-supervised local expansion method).该方法借鉴了带约束的半监督聚类的思想,不仅利用网络的拓扑结构信息,还充分地利用网络节点的属性信息.首先将网络节点的属性信息转化为成对约束,并根据成对约束修正网络的拓扑结构,使网络中的社区结构更加明显;然后基于网络节点的度中心性选取种子节点,得到分散的、局部节点度大的种子作为初始社区;再采用贪心策略将初始社区向邻居节点扩展,得到局部连接紧密的社区;最后检测并合并冗余社区,得到高覆盖率的社区发现结果.在模拟网络数据和真实网络数据上与当前有代表性的基于局部扩展的重叠社区发现算法进行了对比实验,结果表明SLEM方法在稀疏程度不同的网络上均能发现较高质量的重叠社区结构.展开更多
Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Prop...Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Propagation Algorithm(Abbreviated as DLPA), to detect communities in complex networks. The algorithm simulates a special voting process to detect overlapping and non-overlapping community structure in complex networks simultaneously. Our algorithm is very efficient, since its computational complexity is almost linear to the number of edges in the network. Experimental results on both real-world and synthetic networks show that our algorithm also possesses high accuracies on detecting community structure in networks.展开更多
There are currently many approaches to identify the community structure of a network, but relatively few specific to detect overlapping community structures. Likewise, there are few networks with ground truth overlapp...There are currently many approaches to identify the community structure of a network, but relatively few specific to detect overlapping community structures. Likewise, there are few networks with ground truth overlapping nodes. For this reason,we introduce a new network, Pilgrim, with known overlapping nodes, and a new genetic algorithm for detecting such nodes. Pilgrim is comprised of a variety of structures including two communities with dense overlap,which is common in real social structures. This study initially explores the potential of the community detection algorithm LabelRank for consistent overlap detection;however, the deterministic nature of this algorithm restricts it to very few candidate solutions. Therefore, we propose a genetic algorithm using a restricted edge-based clustering technique to detect overlapping communities by maximizing an efficient overlapping modularity function. The proposed restriction to the edge-based representation precludes the possibility of disjoint communities, thereby, dramatically reducing the search space and decreasing the number of generations required to produce an optimal solution. A tunable parameterr allows the strictness of the definition of overlap to be adjusted allowing for refinement in the number of identified overlapping nodes. Our method, tested on several real social networks, yields results comparable to the most effective overlapping community detection algorithms to date.展开更多
Due to the increasingly large size and changing nature of social networks, algorithms for dynamic networks have become an important part of modern day community detection. In this paper, we use a well-known static com...Due to the increasingly large size and changing nature of social networks, algorithms for dynamic networks have become an important part of modern day community detection. In this paper, we use a well-known static community detection algorithm and modify it to discover communities in dynamic networks. We have developed a dynamic community detection algorithm based on Speaker-Listener Label Propagation Algorithm (SLPA) called SLPA Dynamic (SLPAD). This algorithm, tested on two real dynamic networks, cuts down on the time that it would take SLPA to run, as well as produces similar, and in some cases better, communities. We compared SLPAD to SLPA, LabelRankT, and another algorithm we developed, Dynamic Structural Clustering Algorithm for Networks Overlapping (DSCAN-O), to further test its validity and ability to detect overlapping communities when compared to other community detection algorithms. SLPAD proves to be faster than all of these algorithms, as well as produces communities with just as high modularity for each network.展开更多
为进一步优化重叠社区检测算法,提出了一种新的基于度和节点聚类系数的节点重要性定义,按照节点重要性降序更新节点,固定节点更新策略,提高社区检测的稳定性。在此基础上,提出了一种基于图嵌入和多标签传播的重叠社区检测算法(overlappi...为进一步优化重叠社区检测算法,提出了一种新的基于度和节点聚类系数的节点重要性定义,按照节点重要性降序更新节点,固定节点更新策略,提高社区检测的稳定性。在此基础上,提出了一种基于图嵌入和多标签传播的重叠社区检测算法(overlapping community detection based on graph embedding and multi-label propagation algorithm,OCD-GEMPA)。该算法结合node2vec模型对节点进行低维向量表示,构建节点之间的权重值矩阵,根据权重值计算标签归属系数,据此选择标签,避免了随机选择问题。在真实数据集和人工合成数据集上对该算法进行实验验证。实验结果表明,与其他重叠社区检测算法相比,OCD-GEMPA在EQ和NMI这两个指标都有明显提升,具有更好的准确性和稳定性。展开更多
文摘提出一种新的面向复杂网络大数据的重叠社区检测算法DOC(detecting overlapping communities over complex network big data),时间复杂度为O(nlog2(n)),算法基于模块度聚类和图计算思想,应用新的节点和边的更新方法,利用平衡二叉树对模块度增量建立索引,基于模块度最优的思想设计一种新的重叠社区检测算法.相对于传统的重叠节点检测算法,对每个节点分析的频率大为降低,可以在较低的算法运行时间下获得较高的识别准确率.复杂网络大数据集上的算法测试结果表明:DOC算法能够有效地检测出网络重叠社区,社区识别准确率较高,在大规模LFR基准数据集上其重叠社区检测标准化互信息指标NMI最高能达到0.97,重叠节点检测指标F-score的平均值在0.91以上,且复杂网络大数据下的运行时间明显优于传统算法.
基金supported by the Fundamental Research Funds for the Central Universities of Chinathe National Natural Science Foundation of China under Grant No. 60905029the Natural Science Foundation of Beijing of China under Grant No. 4112046
文摘In this paper, we propose a balanced multi-label propagation algorithm (BMLPA) for overlapping community detection in social networks. As well as its fast speed, another important advantage of our method is good stability, which other multi-label propagation algorithms, such as COPRA, lack. In BMLPA, we propose a new update strategy, which requires that community identifiers of one vertex should have balanced belonging coefficients. The advantage of this strategy is that it allows vertices to belong to any number of communities without a global limit on the largest number of community memberships, which is needed for COPRA. Also, we propose a fast method to generate "rough cores", which can be used to initialize labels for multi-label propagation algorithms, and are able to improve the quality and stability of results. Experimental results on synthetic and real social networks show that BMLPA is very efficient and effective for uncovering overlapping communities.
文摘重叠社区发现是近年来复杂网络领域的研究热点之一.提出一种半监督的局部扩展式重叠社区发现方法SLEM(semi-supervised local expansion method).该方法借鉴了带约束的半监督聚类的思想,不仅利用网络的拓扑结构信息,还充分地利用网络节点的属性信息.首先将网络节点的属性信息转化为成对约束,并根据成对约束修正网络的拓扑结构,使网络中的社区结构更加明显;然后基于网络节点的度中心性选取种子节点,得到分散的、局部节点度大的种子作为初始社区;再采用贪心策略将初始社区向邻居节点扩展,得到局部连接紧密的社区;最后检测并合并冗余社区,得到高覆盖率的社区发现结果.在模拟网络数据和真实网络数据上与当前有代表性的基于局部扩展的重叠社区发现算法进行了对比实验,结果表明SLEM方法在稀疏程度不同的网络上均能发现较高质量的重叠社区结构.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61173093 and 61202182)the Postdoctoral Science Foundation of China(Grant No.2012 M521776)+2 种基金the Fundamental Research Funds for the Central Universities of Chinathe Postdoctoral Science Foundation of Shannxi Province,Chinathe Natural Science Basic Research Plan of Shaanxi Province,China(Grant Nos.2013JM8019 and 2014JQ8359)
文摘Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Propagation Algorithm(Abbreviated as DLPA), to detect communities in complex networks. The algorithm simulates a special voting process to detect overlapping and non-overlapping community structure in complex networks simultaneously. Our algorithm is very efficient, since its computational complexity is almost linear to the number of edges in the network. Experimental results on both real-world and synthetic networks show that our algorithm also possesses high accuracies on detecting community structure in networks.
文摘There are currently many approaches to identify the community structure of a network, but relatively few specific to detect overlapping community structures. Likewise, there are few networks with ground truth overlapping nodes. For this reason,we introduce a new network, Pilgrim, with known overlapping nodes, and a new genetic algorithm for detecting such nodes. Pilgrim is comprised of a variety of structures including two communities with dense overlap,which is common in real social structures. This study initially explores the potential of the community detection algorithm LabelRank for consistent overlap detection;however, the deterministic nature of this algorithm restricts it to very few candidate solutions. Therefore, we propose a genetic algorithm using a restricted edge-based clustering technique to detect overlapping communities by maximizing an efficient overlapping modularity function. The proposed restriction to the edge-based representation precludes the possibility of disjoint communities, thereby, dramatically reducing the search space and decreasing the number of generations required to produce an optimal solution. A tunable parameterr allows the strictness of the definition of overlap to be adjusted allowing for refinement in the number of identified overlapping nodes. Our method, tested on several real social networks, yields results comparable to the most effective overlapping community detection algorithms to date.
文摘Due to the increasingly large size and changing nature of social networks, algorithms for dynamic networks have become an important part of modern day community detection. In this paper, we use a well-known static community detection algorithm and modify it to discover communities in dynamic networks. We have developed a dynamic community detection algorithm based on Speaker-Listener Label Propagation Algorithm (SLPA) called SLPA Dynamic (SLPAD). This algorithm, tested on two real dynamic networks, cuts down on the time that it would take SLPA to run, as well as produces similar, and in some cases better, communities. We compared SLPAD to SLPA, LabelRankT, and another algorithm we developed, Dynamic Structural Clustering Algorithm for Networks Overlapping (DSCAN-O), to further test its validity and ability to detect overlapping communities when compared to other community detection algorithms. SLPAD proves to be faster than all of these algorithms, as well as produces communities with just as high modularity for each network.
文摘为进一步优化重叠社区检测算法,提出了一种新的基于度和节点聚类系数的节点重要性定义,按照节点重要性降序更新节点,固定节点更新策略,提高社区检测的稳定性。在此基础上,提出了一种基于图嵌入和多标签传播的重叠社区检测算法(overlapping community detection based on graph embedding and multi-label propagation algorithm,OCD-GEMPA)。该算法结合node2vec模型对节点进行低维向量表示,构建节点之间的权重值矩阵,根据权重值计算标签归属系数,据此选择标签,避免了随机选择问题。在真实数据集和人工合成数据集上对该算法进行实验验证。实验结果表明,与其他重叠社区检测算法相比,OCD-GEMPA在EQ和NMI这两个指标都有明显提升,具有更好的准确性和稳定性。