摘要
抽样作为一种有效的统计分析方法,常被用于大规模图数据分析领域以提升性能。现有的图抽样算法大多存在高度节点或低度节点过度入样的问题,较大程度地影响了算法的性能。复杂网络具有无标度特性,即节点的度服从幂律分布,节点个体之间存在较大差异。在基于点选择策略的抽样方法的基础上,通过结合节点的近似度分布策略,设计并实现了高效无偏的分层图抽样算法SNS。在3个真实的图数据集上的实验结果表明,SNS算法比其他图抽样算法保留了更多的拓扑属性,且执行效率比FFS更高。SNS算法在度的无偏性、抽样结果拓扑属性近似性方面的表现均优于现有算法。
As an effective method of statistical analysis,sampling is commonly used in the field of analyzing the large-scale graph data to improve the performance.However,most of the existing graph sampling algorithms often have the problem of excessive sampling of high and low nodes,resulting in lower accuracy derived from the scale-free characteristic of complex networks.The scale-free characteristic means the degrees of different nodes follow a power law distribution,and the difference between nodes is huge.On the basis of the sampling method on node selection strategy,combining the approximate degree distribution strategy of nodes,this paper proposed and realized an efficient and unbiased stratified graph sampling algorithm named SNS.The experimental results show that SNS algorithm preserves more topological properties on three real data sets than other graph sampling algorithms,and consumes less time than FFS algorithm.Therefore,SNS algorithm is superior to the existing algorithms in terms of the unbiasedness of degree and the accuracy of sampling results.
作者
朱君鹏
李晖
陈梅
戴震宇
ZHU Jun-peng;LI Hui;CHEN Mei;DAI Zhen-yu(College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;Guizhou Engineering Laboratory of Advance Computing and Medical Information Service,Guiyang 550025,China)
出处
《计算机科学》
CSCD
北大核心
2018年第11期249-255,共7页
Computer Science
基金
国家自然科学基金项目(61562010,61462012,U1531246)
贵州省重大应用基础研究项目(JZ20142001)
贵州省数据分析云服务创新团队(黔科合人才团队字[2015]53)
贵州大学研究生创新基金项目(研理工2017078)资助。
关键词
有偏抽样
分层抽样
图抽样
向量聚类
性能评估
Biased sampling
Stratified sampling
Graph sampling
Vector clustering
Performance estimation