摘要
针对Impala大数据实时查询系统在查询优化上存在的问题,提出基于浓密树和改进的MinCutConservative Hypergraph(McCHyp)算法的Impala查询优化方法.首先,修改Impala使其支持浓密树的查询计划;接着,使用剪枝策略对McCHyp算法进行改进,减少查询优化的时间;最后,提出一种适用于Impala的代价模型,并将改进的McCHyp算法集成到Impala中,根据用户的SQL语句生成较优的查询计划.在Impala系统上实现了本文提出的查询优化方法并在TPC-H数据集上进行了实验,结果表明改进的McCHyp算法与McCHyp算法对连接超图的优化结果一致,且前者的运行时间减少了43.82%~62.55%.同时,使用改进的McCHyp算法及新的代价模型对查询语句优化后,查询响应时间较原始的Impala系统减少了79.60%.
As the real-time big-data query system Impala has problems with query optimization,we propose a bushy-tree and Improved-MinCutConservative Hypergraph(Improved-McCHyp)algorithm based Impala query optimization method.The method firstly modifies Impala to support bushy-tree query plans.Then it improves McCHyp algorithm with pruning strategy to reduce query optimization time.Finally,we propose a new cost model,and integrate Improved-McCHyp algorithm into Impala to generate better query plans with user's SQL statement.The query optimization method is implemented in Impala and evaluated using TPC-H,and experimental results show that ImprovedMcCHyp algorithm had the same result with McCHyp algorithm,and the running time of the former decreases by 43.82% ~62.55%.Also,the processing time of the query is optimized by ImprovedMcCHyp algorithm and the new cost model decreases by 79.60%.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第S2期39-47,共9页
Journal of Computer Research and Development
基金
"核高基"国家重大科技专项课题基金项目(2010ZX01042-002-003)
国家自然科学基金项目(60703040
61332017)
浙江省重大科技专项基金项目(2011C13042
2013C01046)
中国工程科技知识中心资助项目(CKCEST-2014-1-5)