期刊文献+

一种大规模数据快速并行导入工具的研究与实现 被引量:1

STUDY AND REALISATION OF A FAST PARALLEL IMPORT TOOL FOR VERY-LARGE DATA
下载PDF
导出
摘要 随着大规模数据的快速增长及高可靠性需求,将本地数据迁移到分布式数据库势在必行。针对这种情况,提出一种基于MapReduce的"快速并行导入"技术,充分利用集群的并行计算能力,直接向HBase底层存储文件HFile写入数据,既可避免上层数据导入时间的浪费,又节省资源开销。有效解决了从单机数据库向HBase分布式数据库导入数据功能低下、效率不高等问题。实验结果表明,在"快速并行导入"技术的基础上设计并实现的快速并行导入工具,支持多列族文本数据的快速导入。与传统使用API导入数据相比,速度提升一倍以上。 With the rapid growth of very-large data and its high reliability requirement, it is inevitable to transplant local data to distributed database. In light of this case, the paper presents a MapReduce-based "fast parallel importing" technology. It makes full use of parallel computational capability of the cluster to write data directly to underlying storage file HFile of HBase, which can either avoid time-wasters in upper data import and save resources overhead as well, thus effectively solves the problems of low performance and inefficiency when importing data from a single database to HBase distributed database. Experimental result demonstrates that the fast parallel import tool designed and implemented based on the "fast parallel importing" technology supports the fast import of multi-column text data. Compared with traditional way using API to import data, its speed heightens more than double.
机构地区 黄河科技学院
出处 《计算机应用与软件》 CSCD 2015年第9期26-30,共5页 Computer Applications and Software
基金 河南省教育厅科学技术研究重点项目(12B520025) 郑州市科技攻关项目(20120473) 校级科研项目(KYZR201006)
关键词 HADOOP HBASE MapReduce分布式数据库 大规模数据导入 Hadoop HBase MapReduce Distributed database Very-large data import
  • 相关文献

参考文献8

  • 1李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量:1601
  • 2Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al. Bigtable : A Distribu- ted Storage System for Structured Data [ C ]//7th USENIX Symposium on Operating Systems Design and Implementation ( OSDI ), 2006 : 205 -218. 被引量:1
  • 3Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google File System[J]. The 19th ACM Symposium on Operating Systems Princi- ples,2003,37 (5) :29 - 43. 被引量:1
  • 4Jeffrey Dean, Sanjay Ghemawat. MapReduce:Simplified Data Process- ing on Large Clusters[ J]. Communications of the ACM ,2004,51 ( 1 ) : 107 - 113. 被引量:1
  • 5Lombardi F, Pietro R D. Secure virtualization for cloud compuLing [ J ~. Journal of Network and Computer Applicat ions, 2011,34 (4) : 1113 -1122. 被引量:1
  • 6Gilad Mishne, Jeff Dalton, Zhenghua Li, et al. Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture [ J]. eprint arXiv,2012.10 : 1210. 7350. 被引量:1
  • 7Adam E. Silberstein, Russell Sears, Wenchao Zhou, et al. A batch of PNUTS: experiences connecting cloud batch and serving systems [C]//The 2011 ACM SIGMOD International Conference on Manage- ment of data,2011 : 1101 - 1112. 被引量:1
  • 8ShaoMin Zhang, JingYan Wang, BaoYi Wang. Research on Data Inte- gration of Smart Grid Based on IEC61970 and Cloud Computing[ J]. 2012(139) :577 -582. 被引量:1

二级参考文献18

  • 1Chris Anderson. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired, 2008, 16 (7). 被引量:1
  • 2Albert-L~iszl6 Barab~isi. The network takeover. Nature Physics, 2012,8(1): 14-16. 被引量:1
  • 3Reuven Cohen, Shlomo Havlin. Scale-Free Networks Are U1- trasmall. Physical Review Letters, 2003, 90,(5 ). 被引量:1
  • 4Tony Hey, Stewart Tansley, Kristin Tolle (Editors). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft, 2009 October 16. 被引量:1
  • 5Big Data. Nature, 2008, 455(7 209): 1-136. 被引量:1
  • 6Dealing with data. Science, 2011,331 ( 6 018 ): 639-806. 被引量:1
  • 7Complexity. Nature Physics, 2012, 8( 1 ). 被引量:1
  • 8Big Data. ERCIM News, 2012, (89). 被引量:1
  • 9David Lazer, Alex Pentland, Lada Adamic et al. Computational Social Science. Science, 2009, 323 ( 5 915 ): 721-723. 被引量:1
  • 10The 2011 Digital Universe Study: Extracting Value from Chaos. International Data Corporation and EMC, June 2011. 被引量:1

共引文献1600

同被引文献13

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部