期刊文献+

一种面向HDFS的数据随机访问方法 被引量:5

Data random access method oriented to HDFS
下载PDF
导出
摘要 为了简化文件系统的实现,支持超大规模数据集的流式访问,HDFS牺牲了文件的随机访问功能,而在实际场景中很多应用都需要对文件进行随机访问。在深入分析HDFS数据读写原理的基础上,提出了一种面向HDFS的数据随机访问方法。其设计思想是为Datanode添加本地数据访问接口,用户程序可以读取Datanode上存放的数据块文件以及把数据写入到Datanode上的数据块存放目录。文件的首副本由用户程序直接产生,其余副本在首副本写入完成之后采用数据复制的方式生成。此外,为数据块添加了权限管理功能,Datanode上的文件副本属于用户所有。若名字空间中文件权限发生变化,文件对应的数据块权限也会改变。测试表明,数据读取性能提升了约10%,数据写入性能提升了20%以上,在高并发下写入性能最大可提升2.5倍。 In order to simplify the realization of the file system,HDFS sacrifices the file’s random access feature to support streaming access for large data set.But in the actual scene,many applications require random access to the file.After indepth analysis of HDFS data reading and writing principle,a data random access method oriented to HDFS is proposed.The idea is to add data access interface for Blocks on Datanode,the user program can read the Block file stored on the Datanode and write the data to the Block storage directory.The first file replica is written to the local Datanode by user program,the rest replicas produced by copy of the first replica stored on other Datanodes.In addition,add the permissions management for Block,the file replicas stored on Datanodes belongs to the user.If the file permissions changed in the namespace,the Block permissions also changed.Test results show that data read and write performance is improved about10%and20%separately,the write performance can be increased by2.5times under the high concurrency.
作者 李强 孙震宇 孙功星 LI Qiang;SUN Zhenyu;SUN Gongxing(Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China;University of Chinese Academy of Sciences, Beijing 100049, China)
出处 《计算机工程与应用》 CSCD 北大核心 2017年第10期1-7,共7页 Computer Engineering and Applications
基金 国家自然科学基金(No.11375223 No.11375221)
关键词 HADOOP分布式文件系统 随机访问 权限管理 Hadoop Distributed File System random access permission management
  • 相关文献

参考文献3

二级参考文献18

  • 1崔小燕.Linux集群系统分析[J].西安邮电学院学报,2006,11(5):103-106. 被引量:13
  • 2Apache. Welcome to Apache Hadoop[EB/OL]. (2012- 03-20) [2012-03-28]. http://hadoop, apache, org/. 被引量:1
  • 3陈艳金.MapReduce模型在Hadoop平台下实现作业调度算法的研究和改进[D].广州:华南理工大学,2011:1-3. 被引量:5
  • 4Herlihy M. , Shavit N. The Art of Multiprocessor Programming[M]. First Edition. BeiJing: China Ma- chine Press, 2008:10-45. 被引量:1
  • 5De Candla G, Hastorun D. Dynamo: Amazon's highly available key-value store. Proe. of the 21st ACM SIGOPS Symposium on Operating Systems Principles. New York. ACM Press. 2007. 14-17. 被引量:1
  • 6Karger D, Lehman E, Leighton T, et al. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. Proc. of the 29th Annual ACM Symposium on Theory of Computing(STOC'97). New York. ACM Press. 1997. 被引量:1
  • 7Wang Jun, Xiao Qiangju, Yin Jiangling, et al. DRAW: a new data-grouping-aware data placement scheme for data intensive applications with interest locality [ J ]. IEEE Transactions on Magnetics, 2013: 1-8. 被引量:1
  • 8Rodrigo N, Calheiros, Rajiv Ranjan, et al. CloudSim: a toolkit for modeling and simulation of cloud computing en- vironments and evaluation of resource provisioning algo- rithms [ J ]. Software : Practice and Experience, 2011, 41 (1): 23-50. 被引量:1
  • 9Whait. Hadoop, the definitive guide [ M ]. [ s. 1. ] : O'Reilly Media, Inc, 2010: 573-575. 被引量:1
  • 10谢桂兰,罗省贤.基于Hadoop MapReduce模型的应用研究[J].微型机与应用,2010,29(8):4-7. 被引量:69

共引文献15

同被引文献45

引证文献5

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部