摘要
连接聚集查询是大规模数据分析的核心操作算子之一,多核处理器为大规模数据的连接聚集查询操作提供了优化空间但同时也存在很大挑战.主要针对多核处理器集群环境,研究MapReduce框架下大规模数据连接聚集查询的优化算法.首先,基于传统MapReduce框架设计并实现了Map端的单线程连接聚集查询算法,基于实验分析指出MapReduce和多核处理器融合提升性能的必要性;其次,针对处理器的多核架构,设计并实现了Map端的多线程连接聚集查询算法,找出了MapReduce和多核处理器融合的性能瓶颈;进而,提出了Map端对输入分片无竞争读取的多线程连接聚集查询算法,该算法让MapReduce充分利用了多核处理器的性能优势.实验结果表明,提出的算法在大规模数据的连接聚集查询处理上充分发挥了硬件优势,具有较好的时间性能和可扩展性.
Join-aggregation query is one of the core operators of large-scale data analysis,multi-core processors possess some advantages on improving query performance,but it is still a big challenge to exploit their abilities,especially in a distributed computing environment.In order to gain the optimization space for join-aggregation queries of large-scale data with multi-core processor cluster,this paper focuses on join-aggregation query optimization algorithms based on MapReduce.Firstly,the map function uses a single thread for join-aggregated queries,whose poor experimental results show that it is necessary to improve the query performance by merging multi-core processors and MapReduce;Secondly,aiming at the multi-core processor architecture,we design and implement a multi-thread join-aggregation query algorithm on the Map side,and then find the performance bottleneck of the cooperation between multi-core processors and MapReduce;Finally,we propose a novel multi-threaded join-aggregation query algorithm,which makes use of advantages of both multicore processors and MapReduce by means of a read mechanism without conflicts on input splits.The experimental results show that the proposed algorithm has good performance and extensibility for joinaggregate queries.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2015年第S1期9-18,共10页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61462017
61363005)
广西自然科学基金项目(2014GXNSFAA118353
2014GXNSFAA118390
2014GXNSFDA118036)
广西高校科研资助项目(2013YB083)
广西自动检测技术与仪器重点实验室基金项目(YQ15110
YQ14109)
桂林电子科技大学研究生创新资助项目(GDYCSZ201465)
广西高等学校高水平创新团队及卓越学者计划
关键词
查询优化
分布式处理
连接聚集查询
多核处理器
无竞争读取
query optimization
distributed processing
join-aggregate query
multi-core processor
read without competition