期刊文献+

基于Hadoop处理小文件的优化策略 被引量:6

Optimization Strategy for Handling Small Files on Hadoop
下载PDF
导出
摘要 HDFS(Hadoop Distributed File System)作为开源系统广泛地适用于各类存储服务中,具有高容错,易扩展,廉价存储等特点。然而,HDFS基于单一的服务器Name Node来处理元数据信息管理,当处理海量小文件时会造成Name Node内存过分消耗以及存储和读取性能并不理想,使Name Node成为系统瓶颈。本文提出一种基于HAR(Hadoop Archive)的优化机制来提高Name Node存储元数据信息的内存利用效率和提高读取小文件的访问效率。另外,该策略也扩展了HAR文件追加的优化和为提高访问效率采用索引预取机制。实验结果表明该优化策略能够提高现有HAR处理小文件的能力和访问海量小文件的效率。 Hadoop Distributed File System(HDFS) is an open source system which has advantages of high fault-tolerance, scalability and low-cost storage capability and has been widely suitable for storage applications. How-ever, HDFS based on single master (NameNode) to handle metadata management, NameNode would have the memory overhead problem and suffer the performance penalty in both storage and accessing while handling massive small files, NameNode would become bottleneck. This paper proposes a mechanism based on Hadoop Archive (HAR) to improve the memory utilization for metadata and enhance efficiency of accessing small files. In addition, this strategy also ex-tends HAR capability to allow additional files to be inserted into the existing archive files and adopts the preload of index files to improve the access efficiency. Experimental results show that this strategy can to improve the capability to handle small files and the efficiency of accessing large number of small files.
作者 左大鹏 徐薇
出处 《软件》 2015年第2期107-111,共5页 Software
关键词 HDFS 小文件 HAR 索引策略 索引预取 HDFS Small files HAR index strategy index preload
  • 相关文献

参考文献16

  • 1M. Vrable, et al., "Cumulus: File system back up to the cloud," ACM Transactions on Storage (TOS), vol. 5, December 2009. 被引量:1
  • 2张春明,芮建武,何婷婷.一种Hadoop小文件存储和读取的方法[J].计算机应用与软件,2012,29(11):95-100. 被引量:39
  • 3"HDFS Fdratin'http://hadp.apach.rg/dcs/stab2/hadp-prjtdist/hadphdfs/Federatin.htm. 被引量:1
  • 4"An Introduction to HDFS Federation," http://hortonworks.com/blog/an-introduction-to-hdfsfederation/. 被引量:1
  • 5J. Liu, et al., "THE optimization of HDFS based on small files," in Broadband Network and Multimedia Technology (IC-BNMT), 2010 3rd IEEE International Conference on, 2010, pp. 912-915. 被引量:1
  • 6杨彬.分布式文件系统HDFS处理小文件的优化方案[J].软件,2014,35(6):65-69. 被引量:8
  • 7L. Xuhui, et al., "Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS," in Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on, 2009, pp. 1-8. 被引量:1
  • 8"Apache Hadoop for Arehiving Email," http://blog.cloudera.com/blog/2011/09/hadoop-forarchiving-email/. 被引量:1
  • 9"Hadoop Archive," http://hadoop.apache.org/docs/rl.2.1/hadoop_archives.html, 2011. 被引量:1
  • 10D. Borthakur, "The Hadoop Distributed File System: Architecture and Design," Hadoop Documentation, 2007. 被引量:1

二级参考文献19

  • 1Armbrust M, Fox A. Griffith R, et al. Above the Clouds: A Berkeley View of Cloud Computing[ D ]. UCB/EECS-2009-28, EECS Department, University of California, Berkeley, 2009. 被引量:1
  • 2Tom White. Hadoop: The Definitive Guide[M]. 2rid ed. O' Reilly Media, Inc ,2011. 被引量:1
  • 3Konstantin Shvachko , Hairing Kuang , Sanyjy Radia , et al. The Ha- doop Distributed File System [ C ]//Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), May 03 -07, 2010:1 -10. 被引量:1
  • 4Hadooparchives[ OL]. http ://hadoop. apache. org/common/docs/current/hadoop_ archives. html. 被引量:1
  • 5Sequence File Wiki [ OL ]. http ://wiki. apache.org/hadoop/Seq uen ce File. 被引量:1
  • 6Map files[OL], http://hadoop. apache. org/common/docs/current / api/org/apache/hadoop/io/MapFile. html. 被引量:1
  • 7Tom White. The Small Files Problem[ OL]. http://www, clou dera. com/blog/2009/02/02/the-small-files-problem/. 被引量:1
  • 8Xuhui Liu, Jizhong Han, Yunqin Zhong, et al. Implementing WebGIS on Hadoop: A Case Study of Improving Small File L/O Performance on HDFS [C]//Proc. of the 2009 IEEE Conf. on Cluster Computing:1 - 8. 被引量:1
  • 9Bo Dong, Jie Qiu, Qinghua Zheng, et al. A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop : a Case Study by PowerPoint Files [ C ]//International Conference on Services Computing,2010:65 - 72. 被引量:1
  • 10吕伟春,胡洪新,汤剑.基于NagiOS的网络监控监控系统研究[J].电脑知识和技术,2010,6(1),48-51. 被引量:1

共引文献52

同被引文献37

引证文献6

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部