期刊文献+

一种基于MapReduce的不确定图上的相似性连接方法

Method of Similarity Join on Uncertain Graphs Using MapReduce
下载PDF
导出
摘要 相比于确定图上的相似性连接,不确定图上的相似性连接通常具有更大的实际应用价值以及计算复杂性。文中研究了基于MapReduce分布式编程框架的不确定图上的相似性连接问题,提出了基于概率和的Map方剪枝和Reduce方剪枝的两种剪枝策略。Map方剪枝策略在映射过程中过滤掉了不可能具有相似图的不确定图。Reduce方剪枝策略用于减少约减过程中的候选图对。基于这两种剪枝策略,文中提出了一种基于MapReduce框架的不确定图上的相似性连接算法MUGSJoin。实验结果证明,该算法与同类算法相比具有更好的性能和可扩展性。 Compared to the similarity join on the deterministic graph,the similarity join on the uncertain graph usually has greater practical application value and higher computational complexity.This paper studied the similarity join on uncertain graph databases based on MapReduce distributed programming framework,and proposed two pruning strategies using the existence probability,namely Map Side Pruning and Reduce Side Pruning.The Map Side Pruning can filter out the uncertain graphs at the Map side which have no chance to be similar with any other uncertain graph at the Map side.The Reduce Side Pruning can reduce the candidate pairs at Reduce side in the process of reduction.Based on the above pruning approaches,this paper proposed a similarity join algorithm MUGSJoin on uncertain graph databases based on MapReduce.The experiment results show that the proposed approach has much better performance and scala-bility than the similar algorithms.
作者 缪丰羽 王宏志 阮群生 MIAO Feng-yu;WANG Hong-zhi;RUAN Qun-sheng(College of Information&Mechanical and Electrical Engineering,Ningde Normal University,Ningde,Fujian 352100,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;College of Software,Xiamen University,Xiamen,Fujian 361000,China)
出处 《计算机科学》 CSCD 北大核心 2018年第12期298-307,共10页 Computer Science
基金 国家自然科学基金重点项目(U1509216) 国家自然科学基金面上项目(61472099 61602129) 国家重点研发计划项目(2016YFB1000703) 福建省自然科学基金面上项目(2018J01555) 福建省教育厅中青年项目(JAT170653)资助
关键词 MAPREDUCE 不确定图 相似性连接 MapReduce Uncertain graph Similarity join
  • 相关文献

参考文献4

二级参考文献71

  • 1李国杰.大数据研究的科学价值[J].中国计算机学会通讯,2012,8(9):8-15. 被引量:122
  • 2CIO时代网.浅析大数据的特点[EB/OL]. (2012-05-08). [2013-5-20] .http://www.ciotimes.com /baike/62989.html. 被引量:1
  • 3Viktor M S, Kenneth C. Big data: A Revolution that will trans-formhowwelive,workandthink[M].盛杨燕,周涛,译.杭州:浙江人民出版社,2013:9-10. 被引量:1
  • 4Miller R. 'Digital universe' nears a zettabyte[EB/OL]. [2012- 11-24]. http://www, datacenterknowledge, corn/archives/ 2010/05/04/digital-universe-nears-a-zettabyte/. 被引量:1
  • 5Broder A Z, Glassman S C, Manasse M S, et al. Syntactic cluste- ring of the web[J]. Computer Networks and ISCN Systems, 1997,29(8-13) : 1157-1166. 被引量:1
  • 6Hoad T C, Zobel J. Methods for identifying versioned and plagi- arized doeuments[J]. Journal of the American Society for Infor- mation Science and Technology, 2003,54(3) :203-215. 被引量:1
  • 7Kolb L,Thor A, Rahm E. Load balancing for MapReduce-based entity resolution [C]//Proc of ICDE. Washington: IEEE, 2012: 618-629. 被引量:1
  • 8Cho J, Shivakumar N, Garcia-Molina H. Finding replicated web collections [C]//Proc of SIGMOD. New York: ACM,2000:355- 366. 被引量:1
  • 9相似图片搜索引擎TinEyeEEB/OL].[2012-11-24].http://www.ipc.me/tineye.html. 被引量:1
  • 10Dean J, Ghemawat S. MapReduce large clusters[C]//Proc of OSDI. simplified data processing on New York: ACM, 2004: 137- 150. 被引量:1

共引文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部