期刊文献+

海量空间数据的并行Top-k连接查询 被引量:7

Parallel Top-k Spatial Join Query Processing on Massive Spatial Data
下载PDF
导出
摘要 在许多空间应用领域中,Top-k空间连接查询是一种十分重要的操作,指定两个空间关系R和S,Top-k空间连接查询从R或S中返回k个与其他空间关系具有最大交叠数的结果.不同于Top-k查询,Top-k空间连接查询先执行空间连接操作,然后才执行Top-k查询.由于空间数据的海量特性和复杂性,传统的单机串行处理需要很长时间甚至不能完成.提出了一种新颖的基于MapReduce的Top-k空间连接查询处理算法TKSJMR.该算法在并行空间连接阶段执行部分聚集操作,减少数据写入和数据传输;在Top-k结果获取阶段提出一种Top-k结果获取算法,将结果聚集和Top-k结果获取缩减为一个阶段,减少MapReduce执行步骤.实验结果表明,该算法不仅在有效时间内解决单机上难以解决的海量空间数据的Top-k连接查询问题,并且TKSJMR在Top-k查询处理阶段性能提升了约50%. 在许多空间应用领域中,Top-k空间连接查询是一种十分重要的操作,指定两个空间关系R和S,Top-k空间连接查询从R或S中返回k个与其他空间关系具有最大交叠数的结果.不同于Top-k查询,Top-k空间连接查询先执行空间连接操作,然后才执行Top-k查询.由于空间数据的海量特性和复杂性,传统的单机串行处理需要很长时间甚至不能完成.提出了一种新颖的基于MapReduce的Top-k空间连接查询处理算法TKSJMR.该算法在并行空间连接阶段执行部分聚集操作,减少数据写入和数据传输;在Top-k结果获取阶段提出一种Top-k结果获取算法,将结果聚集和Top-k结果获取缩减为一个阶段,减少MapReduce执行步骤.实验结果表明,该算法不仅在有效时间内解决单机上难以解决的海量空间数据的Top-k连接查询问题,并且TKSJMR在Top-k查询处理阶段性能提升了约50%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第S3期163-172,共10页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2008AA12A211 2011AA120306) 国家自然科学基金项目(40801160 60902036)
关键词 Top-k空间连接 MAPREDUCE 冗余避免 Top-k spatial join MapReduce duplicate avoidance
  • 相关文献

参考文献16

  • 1王鹏,孟丹,詹剑锋,涂碧波.数据密集型计算编程模型研究进展[J].计算机研究与发展,2010,47(11):1993-2002. 被引量:39
  • 2Brinkhoff T,Kriegel H P,Seeger B.Efficient processing of spatial joins using R-trees. ACM SIGMOD Int. Conf. on Management of Data . 1993 被引量:1
  • 3G. Luo,J. Naughton,C. Ellman.A Non-Blocking Parallel Spatial Join Algorithm. Proc International Conference on Data Engineering . 2002 被引量:1
  • 4Dittrich,J -P,B. Seeger.Data Redundancy and Duplicate Detection in Spatial Join Processing. Proceedings of the 16th International Conference on Data Engineering . 2000 被引量:1
  • 5Tao Y,Papadias D.Range Aggregate Processing in Spatial Databases. IEEE Transactions on Knowledge and Data Engineering . 2004 被引量:1
  • 6Zhang D,Tsotras V J,Gunopulos D.Efficient Aggregation over Objects with Extent. Proceedings of the 21th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems . 2002 被引量:1
  • 7M Zhu.Top-k spatial joins. IEEE Transactions on Knowledge and Data Engineering . 2005 被引量:1
  • 8Jeffrey Dean,Sanjay Ghemawat.MapReduce:Simplified Data Processing on Large Clusters. OSDI’’04:Sixth Symposium on Operating System Design andImplementation . 2004 被引量:1
  • 9Patel JM,DeWitt DJ.Partition based spatial-merge join. SIGMOD Record . 1996 被引量:1
  • 10Arge L,Procopiuc O,Ramaswamy S et al.Scalable sweeping-based spatial join. . 1998 被引量:1

二级参考文献30

  • 1Wikipedia. Cloud computing [EB/OL]. [ 2008-11 -16 ]. http ://en. wikipedia, org/wiki/Cloud computing. 被引量:1
  • 2Ghemawat S, Gobioff H, Leung S. The Google file system [C] //Proc of the 19th ACM Symp on Operating System Principles(SOSP). New York, ACM, 2003:29-43. 被引量:1
  • 3Dean J, Ghemawat S. MapReduee: Simplified data processing on large clusters [C] //Proc of the 6th USENIX Symp on Operating Systems Design and Implementation (OSDI). San Francisco: USENIX Association, 2004: 137- 150. 被引量:1
  • 4Chang F, Dean J, Ghemawat S. et al. Bigtable: A distributed storage system for structured data [C] //Proc of the 7th USENIX Syrup on Operating Systems Design and Implementation(OSDI). San Francisco: USENIX Association, 2006:205-218. 被引量:1
  • 5Amazon Web Services. Amazon Elastic Compute Cloud [EB/OL]. [2008-12-01]. http://aws, amazon, com/cc2/. 被引量:1
  • 6Amazon Web Services. Amazon Simple Storage Service [EB/OL]. [2008- 12-01]. http://aws, amazon, com/s3/. 被引量:1
  • 7Patterson D, Technical perspective: The data center is the computer[J]. Communications of the ACM, 2008, 51(1) 105-105. 被引量:1
  • 8Bryant R. Data intensive supercomputing: The case for DISC [R/OL]. [2008-12- 10]. http://www, cs. cmu. edu/-bryant/ pubdir/emu cs 07-128. pdf. 被引量:1
  • 9Bell G, Gray J, Alex S. Petascale computational systems: Balanced cyberInfrastructure in a data centric world [J].Computer, 2006, 39(1): 110-112. 被引量:1
  • 10Newman H, Ellisman M, Orcutt J. Data-intensive e-science frontier research [J]. Communications of the ACM, 2003, 46(11) :68-77. 被引量:1

共引文献38

同被引文献38

  • 1SHIM K, SRIKANT R, AGRAWAL R. High-dimensional similarity joins [ J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(1): 156-171. 被引量:1
  • 2BOHM C, BRAUNMCdLLER B, KREBS F, et al. Epsilon grid or- der: an algorithm for the similarity join on massive high-dimensional data [ C]// Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2001:379 -388. 被引量:1
  • 3KALASHNIKOV D. Super-EGO: fast multi-dimensional similarity join [J]. The VLDB Journal, 2013, 22(4): 561 -585. 被引量:1
  • 4DEAN J, GHEMAWAT S. MapReduee: simplified data processing on large clusters [ C]// Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. San Francisco: USENIX Association, 2004:137 - 150. 被引量:1
  • 5SEIDL T, FRIES S, BODEN B. MR-DSJ: distance-based self-join for large-scale vector data analysis with MapReduce [ C]//Proceed- ings of the 15th BTW Conference on Database Systems for Business, Technology, and Web. Berlin: Springer, 2013:37-56. 被引量:1
  • 6FRIES S, BODEN B, STEPIEN G, et al. PHiDJ: parallel similarity self-join for high-dimensional vector data with MapReduce [ C]// Proceedings of the 30th IEEE International Conference on Data Engi- neering. Piscataway, NJ: IEEE, 2014:796-807. 被引量:1
  • 7LUO W, TAN H, MAO H, et al. Efficient similarity joins on mas- sive high-dimensional datasets using MapReduce [ C]// Proceedings of the 13th IEEE International Conference on Mobile Data Manage- ment. Piscataway, NJ: IEEE, 2012: 1-10. 被引量:1
  • 8LU W, SHEN Y, CHEN S, et al. Efficient processing of k nearest neighbor joins using MapReduce [ J]. Proceedings of the VLDB Endowment, 2012, 5(10) : 1016 - 1027. 被引量:1
  • 9ZHANG C, LI F, JESTES J. Efficient parallel kNN joins for large data in MapReduce [ C]// Proceedings of the 15th International Conference on Extending Database Technology. New York: ACM, 2012:38-49. 被引量:1
  • 10VERNICA R, CAREY M, LI C. Efficient parallel set-similarity joins using MapReduce [ C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM, 2010:495-506. 被引量:1

引证文献7

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部