摘要
由于现实生活中的许多应用都以图的形式生成数据,并且一个大图包含数百万个顶点和数十亿条边等问题.本文提出了基于BC-BSP(Bulk Synchronous Prallel,大块同步并行)的系统BC-BSP+,以并行化方式来对大图进行迭代处理.通过BSP系统灵活配置策略(即磁盘管理参数)和拓展功能(即编程接口),根据容错和负载均衡计算大规模图形.通过图的三种分区策略(随机Hash划分算法RHP、负载均衡Hash划分算法BHP和基于范围的顶点划分算法VCRP)来支持大图的处理工作.实验结果表明VCRP优于BHP和RHP,采用VCRP分区策略将BC-BSP+与基于MapReduce的Hadoop进行对比,得出BC-BSP+总体表现均比Hadoop、Giraph和Hama处理大图数据的效率高.
Many applications in real life are generating data in the form of graph and a large graph usually contains millions of vertices and billions of edges. Based on the above problems, this paper puts forward a system named BSP (Synchronous Prallel Bulk, block synchronous parallel), or BC-BSP- in short, in parallel with the way to deal with the large graph. First, the strategy (namely disk man agement) should be flexibly configurated and the function (the programming interface) be expanded through the BSP system; second, large-scale graphics can be calculated based on fault tolerance and load balancing. Three partitioning strategies are supposed to support the larger graph processing: a randomized Hash partitioning algorithm (RHP), a load balancing Hash partitioning algorithm (BHP) and vertex partitioning algorithm (VCRP). Experimental results show that on the whole, the VCRP is better than BHP, RHP, specifically speaking, using the VCRP partition strategy to compare BC- BSP+ with Hadoop based MapReduce, the results show that the BC-BSP+ performs with higher efficiency in processing large data than that of Hadoop, Giraph and Hama.
出处
《兰州文理学院学报(自然科学版)》
2017年第3期88-95,共8页
Journal of Lanzhou University of Arts and Science(Natural Sciences)
基金
四川省教育厅重点项目(15ZA0339)
阿坝师范学院校级规划项目(ASB12-24)
关键词
聚类
分层聚类
模糊聚类
聚类数
有效性函数
clustering
hierarchical clustering
fuzzy clustering
clustering number
validity function