期刊文献+

一种Hadoop小文件存储优化策略研究 被引量:5

Research on Small Files Optimized Storage Strategy in Hadoop System
下载PDF
导出
摘要 随着"大数据"时代的到来,Hadoop等大数据处理平台也应运而生。但其存储载体——Hadoop分布式文件系统却在海量小文件存储方面存在着很大缺陷,存储海量小文件会导致整个集群的负载增高、运行效率下降。为了解决这一针对小文件的存储缺陷,通常的方法是将小文件进行合并,将合并后的大文件进行存储,但以往方法并未将文件体积大小分布加以利用,未能进一步提升小文件合并效果。本文提出一种基于数据块平衡的小文件合并算法,优化合并后的大文件体积分布,有效降低HDFS数据分块,从而减少集群主节点内存消耗、降低负载,使数据处理过程可以更高效的运行。 With the advent of " BIG data",big data processing platform such as Hadoop has emerged. But its storage carrier-- Hadoop distributed file system has many significant flaws on the storage of mass small files,storing massive amounts of small files will not only increase the load of entire cluster,but also decrease operating efficiency. In order to solve the defect,the usual method is to merge small files to a big one,and then it will be stored instead. However,the conventional method does not take advantage of the volume size distribution,so it failed to further enhance the combined effect of small files. This paper presents a data block based on a balance of small files merging algorithm to optimize distribution of merged large files volume,which could effectively reducing the HDFS data block. Thereby the reducing of primary node memory consumption and running load will cause data processing can be run more efficiently.
出处 《智能计算机与应用》 2015年第3期28-32,36,共6页 Intelligent Computer and Applications
关键词 HDFS 小文件存储 小文件合并算法 HDFS Storage of Small Files Small Files Merge Algorithm
  • 相关文献

参考文献11

  • 1http://finance.21cn. com/stock/wmkzg/a/2014/0910/14/28200740. sht-ml. 被引量:1
  • 2www. zdnet. com. cn至顶网-石计算第一门户. 被引量:1
  • 3大数据架构hadoop http://blog. csdn. net/guoxiaoqian8028/article/details/18772363. 被引量:1
  • 4YU L, CHEN G, WANG W, et al. Msfss: A storage system for masssmall files [ C ] // Computer Supported Cooperative Work in Design,2007. CSCWD 2007. 11th International Conference on, [ S. 1.].IEEE, 2007:1087-1092. 被引量:1
  • 5BEAVERD, KUMAR S, LI H C,et al. Finding a Needle in Hay-stack ;Facebooks Photo Storage[C]//OSDI. 2010,10,Vancouver,BC : [ s. n. ] : 1-8. 被引量:1
  • 6TaobaoFile System 项目主页,http://tfs. taobao. org/. 被引量:1
  • 7LIUX, YU Q, LIAO J. FASTDFS: A High Performance DistributedFile System[ J] . ICIC express letters. Part B, Applications : an inter-national journal of research and surveys, 2014, 5(6) : 1741 - 1746. 被引量:1
  • 8QIANY, YI R, DU Y,et al. Dynamic I/O congestion control inscalable Lustre file system [ C ] //Mass Storage Systems and Technolo-gies (MSST), 2013 IEEE 29th Symposium on. IEEE, Lake Arrow-head: IEEE, 2013:1 -5. 被引量:1
  • 9陈剑,龚发根.一种优化分布式文件系统的文件合并策略[J].计算机应用,2011,31(A02):161-163. 被引量:6
  • 10董其文..基于HDFS的小文件存储方法的研究[D].大连海事大学,2013:

二级参考文献9

  • 1BOKHARI S, RUT1" B, WYCKOFF P, et al. Experimental analysis of a mass storage system [ J]. Concurrency and Computation: Practice and Experience, 2006, 18(4) : 1929-1950. 被引量:1
  • 2WANG FANG, YUE YINLIANG, FENF DAN, et al. High availability storage system based on two-level metadata management [ C]// FCST 2007: Proceedings of the 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology. Piscataway, N J: IEEE, 2007:41 -48. 被引量:1
  • 3LI HUAIYANG, LIU YAN, CAO QIANG. Approximate parameters analysis of a closed fork-join queue model in an object-based storage system [ C] // Proceedings of the Eighth International Symposium on Optical Storage and 2008 International Workshop on Information Data Storage, SPIE 7125. IS. 1. ] : SPIE, 2008:1 -6. 被引量:1
  • 4ZHAO TIEZHU, VERDI M, DONG SHOUBIN, et d. Evaluation of a performance model Lustre file system [ C]// Proceedings tff the fifth Annual ChinaGfid Conference. Piscataway, NJ: IIElZ.; 2010:. 191 -196. 被引量:1
  • 5ZHAO TIEZHU, HU JINLONG. Performance evaluation of parallel file system based on Lustre and grey theory [ C]//Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing. Washington, DC: IEEE Computer Society, 2010:118 -122. 被引量:1
  • 6KONSTANTIN S, HAIRONG K, SANJAY R, et al. The Hadoop distributed file system [ C]// Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies. Piseataway, NJ: IEEE, 2010:1-10. 被引量:1
  • 7Apache Hadoop Project. SequenceFile Class [ EB/OL]. [ 2011-02-17 ]. http://hadoop, apache, org/common/docs/current/api/ org/apache/hadoop/io/SequenceFile, html. 被引量:1
  • 8栾亚建,黄翀民,龚高晟,赵铁柱.Hadoop平台的性能优化研究[J].计算机工程,2010,36(14):262-263. 被引量:51
  • 9赵铁柱,董守斌,Verdi MARCH,Simon SEE.面向并行文件系统的性能评估及相对预测模型[J].软件学报,2011,22(9):2206-2221. 被引量:7

共引文献5

同被引文献32

  • 1董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量:70
  • 2林智煜.基于海量高维图像的大数据处理框架[J].电子科技大学,2014. 被引量:1
  • 3李国琦.中国产业互联网峰会[EB/OL],2015.http://www.800t00.com/content/548884.shtml. 被引量:1
  • 4Finta I, Farkas L, Szenasi S, et al. Buffering strategies in HDFS environment with STORM framework [ C ] //16th IEEE inter- national symposium on computational intelligence and infor- matics. [ s. I. ] : IEEE ,2015:297-302. 被引量:1
  • 5Zhang Q F,Zhang W D,Li W J,et al. Cloud storage system forsmall file based on P2P [ J ]. Journal of Zhejiang University ( Engineering Science) ,2013,47 ( 1 ) :7-8. 被引量:1
  • 6Mackey G, Sehfish S, Wang J. Improving metadata manage- ment for small files in HDFS[ C]//IEEE international confer- ence on cluster computing and workshops. [ s. 1. ] : IEEE, 2009 : 1-4. 被引量:1
  • 7Apache. The homepage of Hadoop [ EB/OL]. 2012. http:// Hadoop. apache, org/. 被引量:1
  • 8Liu X, Han J, Zhong Y, et al. Implementing WebGIS on Ha- doop:a case study of improvingsmall file I/O performance on HDFS[ C]//IEEE international conference on duster compu- ting and workshops. [ s. 1. ] : IEEE ,2009 : 1-8. 被引量:1
  • 9江柳.HDFS下小文件存储优化相关技术研究[D].北京:北京邮电大学,2011. 被引量:1
  • 10胡海峰,贾玉辰.一种Hadoop存取海量小文件的优化方法:CN,CN104536959A[P],2015-04-22. 被引量:1

引证文献5

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部