期刊文献+

基于列存储的大数据分析系统物化策略研究 被引量:6

Materialization Strategies in Big Data Analysis System Based on Column-Store
下载PDF
导出
摘要 大数据具有规模大、深度大、宽度大、处理时间短、硬件系统普通化和软件系统开源化特点.针对当前传统数据库在对大数据进行分析时系统性能严重下降、计算效率提升有限的问题,提出一种基于列存储的大数据分析系统物化策略(materialization strategies in MapReduce based on column-store,MSMC).首先,通过引入MapReduce物化代价估计模型,深入分析影响物化效率的各个因素.在此基础上设计了MapReduce分布式环境下的列存储文件格式(MapReduce column-store file,MCF),并在数据加载过程中采用协同定位策略实现对物化数据的存储优化.其次,分别针对不同的物化时机,构建了MapReduce早期物化策略(MapReduce early materialization strategy,MEMS)、MapReduce延迟物化策略(MapReduce late materialization strategy,MLMS)和MapReduce混合物化策略(MapReduce early-late materialization strategy,MELMS).利用自适应物化调整策略对其做了进一步优化.实验结果在证明算法有效的同时,也显示出算法在存储空间和负载能力上都有很好的表现. The characters of big data are volume, variety, velocity, common hardware and open source. In traditional relational database, materialization can speed up query processing greatly. However, modern big data analysis faces a confluence of growing challenges that systems become more and more inefficiently and scalability. Consequently, this paper presents some materialization strategies based on column-store to provide an effective environment for big data analysis. Firstly, it analyzes the impact of materialization efficiency by MapReduce cost model. Secondly, it designs the MapReduce column-store File, and achieves optimization by cooperative localization strategy. Fourthly, according to the different materialization time window, it proposes materialization strategies in MapReduce based on column-store (MSMC), which is composed of three strategies: MapReduce early materialization strategy (MEMS), MapReduce late materialization strategy (MLMS) and MapReduce early-late materialization strategy (MELMS). Thirdly, for the sake of avoiding malignant expansion of materialization sets, it designs the adaptive materialization sets adjust strategy(AMSAS), which realizes the optimization of MSMC effectively. Finally, the experiments are conducted to evaluate execution time and load capacity. The results reveal that the materialization strategies in MapReduce based on column-store and adaptive materialized set adjustment strategy can effectively reduce the intermediate data process of MapReduce, network bandwidth and unnecessary I/O. It verifies the effectiveness of the proposed method in big data analysis.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第5期1061-1070,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61103046) 中央高校基本科研业务费专项资金项目(东华大学"励志计划"项目(B201312)) 浙江省教育厅科研基金项目(Y201225326 Y201432374)
关键词 大数据 列存储 物化策略 MAPREDUCE 分析系统 big data column-store materialization strategy(MS) MapReduce analysis system
  • 相关文献

参考文献13

  • 1Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters [C] //Proc of the 6th Conf on Operating System Design and Implementation (OSDI). Berkeley, CA: USENIX Association, 2004:137-150. 被引量:1
  • 2Dean J, Ghemawat S. MapReduce: A flexible data processing tool[J]. Communications of the ACM, 2010, 53 (1): 72-77. 被引量:1
  • 3Stonebraker M, Abadi D J, Batkin A, et al. C-Store.. A column-oriented DBMS [C] //Proc of the 31st Int Conf on Very Large Data Bases(VLDB). New York: ACM, 2005: 553-564. 被引量:1
  • 4Boncz P, Zukowski M, Nes N. MonetDB/X100: Hyper- pipelining query execution [C] //Proc of the 2nd Conf on Innovative Data Systems Research (CIDR). New York: ACM, 2005:225-237. 被引量:1
  • 5Vavilapa]li V K, Murthy A C, Douglas C, et al. Apache hadoop YARN: Yet another resource negotiator [C]//Proc of the 4th Symp on Cloud Computing(SoCC). New York: ACM, 2013:5-21. 被引量:1
  • 6Shrinivas L, Bodagala S, Varadarajan S, et al. Materialization strategies in the ve. 被引量:1
  • 7rtica analytic database: Lessons learned [C] //Proc of Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2013:1196-1207. 被引量:1
  • 8Idreos S, Groffen F, Niels N K, et al. MonetDB: Two decades of research in column-oriented database architectures [J]. IEEE Data Engineering Bulletin (DEBU), 2012, 35 (1): 40-45. 被引量:1
  • 9Nyklel T, Potamias M, Mishra C, et al. MRShare: Sharing across multiple queries in MapReduce [J]. Proceedings of the VLDB Endowment, 2010, 3(1): 494-505. 被引量:1
  • 10Baralis E, Paraboschi S, Teniente E. Materialized views selection in a multidimensional database [C] //Proc of the 23rd Int Conf on Very Large Data Bases (VLDB). San Francisco: Morgan Kaufmann, 1997:156-165. 被引量:1

同被引文献37

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部