期刊文献+

基于浓密树和改进McCHyp算法的Impala查询优化 被引量:1

Bushy Tree and Improved-McCHyp Algorithm Based Impala Query Optimization
下载PDF
导出
摘要 针对Impala大数据实时查询系统在查询优化上存在的问题,提出基于浓密树和改进的MinCutConservative Hypergraph(McCHyp)算法的Impala查询优化方法.首先,修改Impala使其支持浓密树的查询计划;接着,使用剪枝策略对McCHyp算法进行改进,减少查询优化的时间;最后,提出一种适用于Impala的代价模型,并将改进的McCHyp算法集成到Impala中,根据用户的SQL语句生成较优的查询计划.在Impala系统上实现了本文提出的查询优化方法并在TPC-H数据集上进行了实验,结果表明改进的McCHyp算法与McCHyp算法对连接超图的优化结果一致,且前者的运行时间减少了43.82%~62.55%.同时,使用改进的McCHyp算法及新的代价模型对查询语句优化后,查询响应时间较原始的Impala系统减少了79.60%. As the real-time big-data query system Impala has problems with query optimization,we propose a bushy-tree and Improved-MinCutConservative Hypergraph(Improved-McCHyp)algorithm based Impala query optimization method.The method firstly modifies Impala to support bushy-tree query plans.Then it improves McCHyp algorithm with pruning strategy to reduce query optimization time.Finally,we propose a new cost model,and integrate Improved-McCHyp algorithm into Impala to generate better query plans with user's SQL statement.The query optimization method is implemented in Impala and evaluated using TPC-H,and experimental results show that ImprovedMcCHyp algorithm had the same result with McCHyp algorithm,and the running time of the former decreases by 43.82% ~62.55%.Also,the processing time of the query is optimized by ImprovedMcCHyp algorithm and the new cost model decreases by 79.60%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第S2期39-47,共9页 Journal of Computer Research and Development
基金 "核高基"国家重大科技专项课题基金项目(2010ZX01042-002-003) 国家自然科学基金项目(60703040 61332017) 浙江省重大科技专项基金项目(2011C13042 2013C01046) 中国工程科技知识中心资助项目(CKCEST-2014-1-5)
关键词 查询优化 IMPALA 代价模型 浓密树 查询计划 query optimization Impala cost model bushy tree query plan
  • 相关文献

参考文献12

  • 1周强,陈岭,马骄阳,赵宇亮,吴勇,王敬昌.基于改进DPhyp算法的Impala查询优化[J].计算机研究与发展,2013,50(S2):114-120. 被引量:3
  • 2Michael Steinbrunn,Guido Moerkotte,Alfons Kemper.Heuristic and randomized optimization for the join ordering problem[J].The VLDB Journal.1997(3) 被引量:1
  • 3Sergey Melnik,Andrey Gubarev,Jing Jing Long,Geoffrey Romer,Shiva Shivakumar,Matt Tolton,Theo Vassilakis.Dremel: interactive analysis of web-scale datasets[].Proceedings of the VLDB Endowment.2010 被引量:1
  • 4Pit Fender,Guido Moerkotte.Reassessing Top-Down Join Enumeration[].IEEE Transactions on Knowledge and Data Engineering.2012 被引量:1
  • 5Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy.Hive: a warehousing solution over a map-reduce framework[].Proceedings of the VLDB Endowment.2009 被引量:1
  • 6Thusoo A,Sarma J S,Jain N,et al.Hive-apetabyte scale data warehouse using hadoop[].Proc of the th Int Conf on Data Engineering.2010 被引量:1
  • 7Marcel Kornacker,Justin Erickson.Cloudera Impala:real-timequeries in Apache Hadoop for real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/ . 2012 被引量:1
  • 8Moerkotte G,Neumann T.Dynamic programming strikes back[].Proc of the ACM SIGMOD Int Conf on Management of Data.2008 被引量:1
  • 9Guy M Lohman,C.Mohan,Laura M.Haas,Dean Daniels,Bruce G.Lindsay,Patricia G.Selinger,Paul F.Wilms.Query processing in r~*[].Query Processing in Database Systems.1985 被引量:1
  • 10Engle C,Lupher A,Xin R,et al.Shark:Fast data analysis using coarse-grained distributed memory[].Proc of the ACM SIGMOD Int Conf on Management of Data.2012 被引量:1

二级参考文献3

  • 1Michael Steinbrunn,Guido Moerkotte,Alfons Kemper.Heuristic and randomized optimization for the join ordering problem[J].The VLDB Journal.1997(3) 被引量:1
  • 2César Galindo-Legaria,Arnon Rosenthal.Outerjoin simplification and reordering for query optimization[J].ACM Transactions on Database Systems (TODS).1997(1) 被引量:1
  • 3Gautam Bhargava,Piyush Goel,Bala Iyer.Hypergraph based reorderings of outer join queries with complex predicates[J].ACM SIGMOD Record.1995(2) 被引量:1

共引文献2

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部