期刊文献+

融合多核和MapReduce的连接聚集查询优化 被引量:1

Join-Aggregation Query Optimization by Integrating Multi-Cores and MapReduce
下载PDF
导出
摘要 连接聚集查询是大规模数据分析的核心操作算子之一,多核处理器为大规模数据的连接聚集查询操作提供了优化空间但同时也存在很大挑战.主要针对多核处理器集群环境,研究MapReduce框架下大规模数据连接聚集查询的优化算法.首先,基于传统MapReduce框架设计并实现了Map端的单线程连接聚集查询算法,基于实验分析指出MapReduce和多核处理器融合提升性能的必要性;其次,针对处理器的多核架构,设计并实现了Map端的多线程连接聚集查询算法,找出了MapReduce和多核处理器融合的性能瓶颈;进而,提出了Map端对输入分片无竞争读取的多线程连接聚集查询算法,该算法让MapReduce充分利用了多核处理器的性能优势.实验结果表明,提出的算法在大规模数据的连接聚集查询处理上充分发挥了硬件优势,具有较好的时间性能和可扩展性. Join-aggregation query is one of the core operators of large-scale data analysis,multi-core processors possess some advantages on improving query performance,but it is still a big challenge to exploit their abilities,especially in a distributed computing environment.In order to gain the optimization space for join-aggregation queries of large-scale data with multi-core processor cluster,this paper focuses on join-aggregation query optimization algorithms based on MapReduce.Firstly,the map function uses a single thread for join-aggregated queries,whose poor experimental results show that it is necessary to improve the query performance by merging multi-core processors and MapReduce;Secondly,aiming at the multi-core processor architecture,we design and implement a multi-thread join-aggregation query algorithm on the Map side,and then find the performance bottleneck of the cooperation between multi-core processors and MapReduce;Finally,we propose a novel multi-threaded join-aggregation query algorithm,which makes use of advantages of both multicore processors and MapReduce by means of a read mechanism without conflicts on input splits.The experimental results show that the proposed algorithm has good performance and extensibility for joinaggregate queries.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第S1期9-18,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61462017 61363005) 广西自然科学基金项目(2014GXNSFAA118353 2014GXNSFAA118390 2014GXNSFDA118036) 广西高校科研资助项目(2013YB083) 广西自动检测技术与仪器重点实验室基金项目(YQ15110 YQ14109) 桂林电子科技大学研究生创新资助项目(GDYCSZ201465) 广西高等学校高水平创新团队及卓越学者计划
关键词 查询优化 分布式处理 连接聚集查询 多核处理器 无竞争读取 query optimization distributed processing join-aggregate query multi-core processor read without competition
  • 相关文献

参考文献14

  • 1Blanas S,Patel J M,Ercegovac V,et al.A comparison of join algorithms for log processing in MapReduce. Proc of the ACM SIGMOD Int Conf on Management of Data . 2010 被引量:1
  • 2陈勇旭,陈梦杰,刘雪冰,宋杰.基于MapReduce的连接聚集查询算法研究[J].计算机研究与发展,2013,50(S1):306-311. 被引量:7
  • 3赵彦荣,王伟平,孟丹,张书彬,李均.基于Hadoop的高效连接查询处理算法CHMJ[J].软件学报,2012,23(8):2032-2041. 被引量:36
  • 4Jiang, David,Tung, Anthony K. H.,Chen, Gang.MAP-JOIN-REDUCE: Toward scalable and efficient data analysis on large clusters. IEEE Transactions on Knowledge and Data Engineering . 2011 被引量:1
  • 5Foto N. Afrati,Jeffrey D. Ullman.Optimizing Multiway Joins in a Map-Reduce Environment. IEEE Transactions on Knowledge and Data Engineering . 2011 被引量:1
  • 6Ding Linlin,Wang Guoren,Xin Junchang,et al.Com MapReduce:An improvement of MapReduce with lightweight communication mechanisms. Proc of the 17th Int Conf on Database Systems for Advanced Applications (DASFAA’’12) . 2012 被引量:1
  • 7Zhang Yanfeng,Gao Qixin,Gao Lixin,et al.Priter:a distributed framework for prioritized iterative computations. Proceedings of the 2nd ACM Symposium on Cloud Computing (SoCC 11) . 2011 被引量:1
  • 8Lin Yuting,Agrawal D,Chen Chen,et al.Llama:leveraging columnar storage for scalable join processing in the MapReduce framework. Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 11) . 2011 被引量:1
  • 9Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy.Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment . 2009 被引量:3
  • 10Jens Dittrich,Jorge-Arnulfo Quiané-Ruiz,Alekh Jindal,Yagiz Kargin,Vinay Setty,J?rg Schad.Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment . 2010 被引量:3

二级参考文献20

  • 1Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the SOSP 2003. 2003.20-43. [doi: 10.1145/1165389. 945450]. 被引量:1
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the OSDI 2004. 2004. 137-150. [doi: 10.1145/1327452.1327492]. 被引量:1
  • 3Yang HC, Dasdan A, Hsiao RL, Parker DS. Map-Reduce-Merge: Simplified relational data processing on large cluster. In: Proc. of the SIGMOD 2007. 2007. 1029-1040. [doi: 10.1145/1247480.1247602]. 被引量:1
  • 4Lammel R. Google's MapReduce programming model Revisited. Science Computer Program, 2008,70(1):1-30. [doi: 10.1016/ j .scico .2007.07.001 ]. 被引量:1
  • 5Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hi:ce: A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2009,2(2): 1626-1627. 被引量:1
  • 6Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R. Hive--A petabyte scale data warehouse using Hadoop data engineering. In: Proc. of the ICDE. 2010. 996-1005. [doi: 10.1109/ICDE.2010.5447738]. 被引量:1
  • 7Olston C, Reed B, Sirvastava U, Kumar R, Tomkins A. Pig Latin: A not-so-foreign language for data processing. In: Proc. of the SIGMOD. 2008. 1099-1110. [doi: 10.1145/1376616.1376726]. 被引量:1
  • 8White T. Hadoop: The Definitive Guide. O'Reilly, 2009. 被引量:1
  • 9Apache Hadoop. http://hadoop.apache.org/. 被引量:1
  • 10Murty J. Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O'Reilly, 2008. 被引量:1

共引文献45

同被引文献8

引证文献1

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部