基于浓密树和改进McCHyp算法的Impala查询优化被引量：1

Bushy Tree and Improved-McCHyp Algorithm Based Impala Query Optimization

下载PDF

导出

摘要针对Impala大数据实时查询系统在查询优化上存在的问题,提出基于浓密树和改进的MinCutConservative Hypergraph(McCHyp)算法的Impala查询优化方法.首先,修改Impala使其支持浓密树的查询计划;接着,使用剪枝策略对McCHyp算法进行改进,减少查询优化的时间;最后,提出一种适用于Impala的代价模型,并将改进的McCHyp算法集成到Impala中,根据用户的SQL语句生成较优的查询计划.在Impala系统上实现了本文提出的查询优化方法并在TPC-H数据集上进行了实验,结果表明改进的McCHyp算法与McCHyp算法对连接超图的优化结果一致,且前者的运行时间减少了43.82%~62.55%.同时,使用改进的McCHyp算法及新的代价模型对查询语句优化后,查询响应时间较原始的Impala系统减少了79.60%. As the real-time big-data query system Impala has problems with query optimization,we propose a bushy-tree and Improved-MinCutConservative Hypergraph(Improved-McCHyp)algorithm based Impala query optimization method.The method firstly modifies Impala to support bushy-tree query plans.Then it improves McCHyp algorithm with pruning strategy to reduce query optimization time.Finally,we propose a new cost model,and integrate Improved-McCHyp algorithm into Impala to generate better query plans with user's SQL statement.The query optimization method is implemented in Impala and evaluated using TPC-H,and experimental results show that ImprovedMcCHyp algorithm had the same result with McCHyp algorithm,and the running time of the former decreases by 43.82% ~62.55%.Also,the processing time of the query is optimized by ImprovedMcCHyp algorithm and the new cost model decreases by 79.60%.

作者马骄阳陈岭赵宇亮杨谊吴勇王敬昌

机构地区浙江大学计算机科学与技术学院浙江鸿程计算机系统有限公司

出处《计算机研究与发展》 EI CSCD 北大核心 2014年第S2期39-47,共9页 Journal of Computer Research and Development

基金 "核高基"国家重大科技专项课题基金项目(2010ZX01042-002-003) 国家自然科学基金项目(60703040 61332017) 浙江省重大科技专项基金项目(2011C13042 2013C01046) 中国工程科技知识中心资助项目(CKCEST-2014-1-5)

关键词查询优化 IMPALA 代价模型浓密树查询计划 query optimization Impala cost model bushy tree query plan

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1周强,陈岭,马骄阳,赵宇亮,吴勇,王敬昌.基于改进DPhyp算法的Impala查询优化[J].计算机研究与发展,2013,50(S2):114-120. 被引量：3
2Michael Steinbrunn,Guido Moerkotte,Alfons Kemper.Heuristic and randomized optimization for the join ordering problem[J].The VLDB Journal.1997(3) 被引量：1
3Sergey Melnik,Andrey Gubarev,Jing Jing Long,Geoffrey Romer,Shiva Shivakumar,Matt Tolton,Theo Vassilakis.Dremel: interactive analysis of web-scale datasets[].Proceedings of the VLDB Endowment.2010 被引量：1
4Pit Fender,Guido Moerkotte.Reassessing Top-Down Join Enumeration[].IEEE Transactions on Knowledge and Data Engineering.2012 被引量：1
5Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy.Hive: a warehousing solution over a map-reduce framework[].Proceedings of the VLDB Endowment.2009 被引量：1
6Thusoo A,Sarma J S,Jain N,et al.Hive-apetabyte scale data warehouse using hadoop[].Proc of the th Int Conf on Data Engineering.2010 被引量：1
7Marcel Kornacker,Justin Erickson.Cloudera Impala:real-timequeries in Apache Hadoop for real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/ . 2012 被引量：1
8Moerkotte G,Neumann T.Dynamic programming strikes back[].Proc of the ACM SIGMOD Int Conf on Management of Data.2008 被引量：1
9Guy M Lohman,C.Mohan,Laura M.Haas,Dean Daniels,Bruce G.Lindsay,Patricia G.Selinger,Paul F.Wilms.Query processing in r~*[].Query Processing in Database Systems.1985 被引量：1
10Engle C,Lupher A,Xin R,et al.Shark:Fast data analysis using coarse-grained distributed memory[].Proc of the ACM SIGMOD Int Conf on Management of Data.2012 被引量：1

二级参考文献3

1Michael Steinbrunn,Guido Moerkotte,Alfons Kemper.Heuristic and randomized optimization for the join ordering problem[J].The VLDB Journal.1997(3) 被引量：1
2César Galindo-Legaria,Arnon Rosenthal.Outerjoin simplification and reordering for query optimization[J].ACM Transactions on Database Systems (TODS).1997(1) 被引量：1
3Gautam Bhargava,Piyush Goel,Bala Iyer.Hypergraph based reorderings of outer join queries with complex predicates[J].ACM SIGMOD Record.1995(2) 被引量：1

共引文献2

1王铭坤,袁少光,朱永利,王德文.基于Storm的海量数据实时聚类[J].计算机应用,2014,34(11):3078-3081. 被引量：30
2张世同.Arrow在分布式查询引擎中的应用与研究[J].现代计算机,2021,27(19):25-31.

同被引文献7

1周强,陈岭,马骄阳,赵宇亮,吴勇,王敬昌.基于改进DPhyp算法的Impala查询优化[J].计算机研究与发展,2013,50(S2):114-120. 被引量：3
2葛星,沈耀,徐常亮.基于云计算的多重查询优化系统[J].计算机工程,2014,40(9):46-50. 被引量：3
3张磊,方祝和,周敏奇,黄岚.面向内存计算的连接算法[J].华东师范大学学报（自然科学版）,2014(5):180-191. 被引量：6
4吴黎兵,邱鑫,叶璐瑶,王晓栋,聂雷.基于Hadoop的SQL查询引擎性能研究[J].华中师范大学学报（自然科学版）,2016,50(2):174-182. 被引量：8
5丁祥武,陈金鑫,王梅.异构计算平台上列存储系统的并行连接优化策略[J].计算机工程与应用,2017,53(5):73-80. 被引量：4
6曲良,陈岚,郝晓冉,倪茂,李莹.基于混合内存的存储系统优化方案[J].电子设计工程,2019,27(21):140-145. 被引量：3
7宗枫博,赵宇海,王国仁,季航旭.面向多表数据连接投影和连接顺序的优化方法[J].计算机科学与探索,2022,16(1):106-119. 被引量：2

引证文献1

1张世同.Arrow在分布式查询引擎中的应用与研究[J].现代计算机,2021,27(19):25-31.

1金红军.基于位置服务的快递查询系统设计[J].信息与电脑（理论版）,2014,0(11):136-136.
2沈思牧.基于Java手机和无线网络的物流车载货物信息实时查询系统的研究[J].山东科学,2007,20(6):75-78.
3姚荣,康泰,陈廷槐.Algorithms for the Determination of Cutsets in a Hypergraph[J].Journal of Computer Science & Technology,1990,5(1):41-46.
4李震,付东瑜.区域自动站信息统计、查询系统[J].农业网络信息,2008(1):49-53. 被引量：2
5胡仁强,杜晓峰,张笑燕.基于BI工具的OLAP SQL语句生成系统研究与实现[J].互联网天地,2014(12):31-35. 被引量：1
6唐有学,徐会明,顾清源.四川省森林防火实时查询系统的说明[J].四川气象,2003,23(3):49-50.
7乔长兵,胡平.SMS在挖掘机实时查询系统中的应用研究[J].微计算机信息,2007(18):206-207.
8李静雅,赵秀梅.基于物联网体系结构的智能公交实时查询系统[J].长治学院学报,2016,33(2):47-49.
9甘玲,刘兴长.基于动态SQL语句生成的数据检索技术[J].后勤工程学院学报,2002,18(2):52-55.
10黄朝东,刘毅.Oracle语句生成C#映射类代码原理[J].信息与电脑（理论版）,2010(10):113-113.

计算机研究与发展

2014年第S2期

浏览历史

内容加载中请稍等...

基于浓密树和改进McCHyp算法的Impala查询优化被引量：1

参考文献12

二级参考文献3

共引文献2

同被引文献7

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于浓密树和改进McCHyp算法的Impala查询优化 被引量：1

参考文献12

二级参考文献3

共引文献2

同被引文献7

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于浓密树和改进McCHyp算法的Impala查询优化被引量：1