期刊文献+

基于MapReduce的高能物理数据分析系统 被引量:9

High Energy Physics Data Analysis System Based on MapReduce
下载PDF
导出
摘要 将MapReduce思想引入到高能物理数据分析中,提出一个基于Hadoop框架的高能物理数据分析系统。通过建立事例的TAG信息数据库,将需要进一步分析的事例数减少2-3个数量级,从而减轻I/O压力,提高分析作业的效率。利用基于TAG信息的事例预筛选模型以及事例分析的MapReduce模型,设计适用于ROOT框架的数据拆分、事例读取、结果合并等MapReduce类库。在北京正负电子对撞机实验上进行系统实现后,将其应用于一个8节点实验集群上进行测试,结果表明,该系统可使4×10-6个事例的分析时间缩短23%,当增加节点个数时,每秒钟能够并发分析的事例数与集群的节点数基本呈正比,说明事例分析集群具有良好的扩展性。 This paper brings the idea of MapReduce parallel processing to high energy physics data analysis, proposes a high energy physics data analysis system based on Hadoop framework. It significantly reduces the number of events that need to do further analysis by 2-3 classes by establishing an event TAG information database, which reduces the I/O volume and improves the efficiency of data analysis jobs. It designs proper MapReduce libs that fit for the ROOT framework to do things such as data splitting, event fetching and result merging by using event pre-selection model based on TAG information and MapReduce model of event analysis. A real system is implemented on BESIII experiment, an 8-nodes cluster is used for data analysis system test, the test result shows that the system shortens the data analyzing time by 23% of 4x l06 event, and event number of concurrence analysis per second is higher than cluster nodes when adding more worker nodes, which explains that the case analysis cluster has a good scalability.
出处 《计算机工程》 CAS CSCD 2014年第2期1-5,共5页 Computer Engineering
基金 国家自然科学基金资助重点项目(90912004)
关键词 高能物理 大数据 数据分析 MAPREDUCE模型 集群 分布式计算 high energy physics big data data analysis MapReduce model cluster distributed computing
  • 相关文献

参考文献19

  • 1Ghemawat S, GobioffH. The Google File System[C]//Proc. of the 19th ACM Symposium on Operating Systems Principles. New York, USA: ACM Press, 2003: 29-43. 被引量:1
  • 2Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating Systems Design & Implementation. San Francisco, USA: ACM Press, 2004: 107-113. 被引量:1
  • 3Chang F, Dean J, Ghemawat S, et al. Bigtable: A Distributed Storage System for Structured Data[J]. ACM Transactions on Computer Systems, 2008, 26(2): 205-218. 被引量:1
  • 4Apache. HADOOP[EB/OL]. (2012-05-01). http://HADOOP. apache.org. 被引量:1
  • 5Bradley D, Dasu S, Maier W, et al. A Highly Distributed, Petascale Migration from dCache to HDFS[C]//Proc. of HEPiX Fall 2011 Workshop. Vancouver, USA: [s. n.], 2011: 1- 24. 被引量:1
  • 6Riahi H, Donvito G, Fanb L. Using HADOOP File System and MapReduce in a Small/Medium Grid Site[J]. Journal of Physics: Conference Series, 2012, 396(4): 50-55. 被引量:1
  • 7Glaser F, Neukirchen H. Analysing High-energy Physics Data Using the MapReduce Paradigm in a Cloud Computing Environment[EB/OL]. (2012-05-11). https://notendur.hi.is/- helmut/publications/VHI-01-2012.pdf. 被引量:1
  • 8The ROOT Team. ROOT[EB/OL]. ,(2010-04-12). http://root. cern.ch. 被引量:1
  • 9Antcheva I, Ballintijn M, Bellenot, et al. ROOT A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization[J]. Computer Physics Communications, 2009, 180(12): 1384-1385. 被引量:1
  • 10Apache Hadoop HDFS Architecture Guide[EB/OL]. (2012-03- 22). http://hadoop.apache.org/docs/r 1.0.4/hdfs_design.html. 被引量:1

同被引文献69

  • 1裴尔明,孙功星,石京燕,于传松.大规模用户登录服务的负载平衡实现[J].计算机工程,2006,32(8):139-140. 被引量:6
  • 2Staples G. TORQUE resource manager[ C ]. Proceed- ings of the 2006 ACM/IEEE conference on Supercom- puting. ACM, 2006: 8. 被引量:1
  • 3Maui. [ EB/OL]. (2014 - 02 - 15 ). http://www. adaptivecomputing, corn/products/open - source/ maul/. 被引量:1
  • 4Apache Hadoop. [ EB/OL]. ( 2005 ). http ://hadoop. apache, org/. 被引量:1
  • 5Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[ C]. Proc. of the 6th con- ference on Symposium on Operating Systems Design & Implementation. San Francisco. CA. USA, ACM Press, 2004. 被引量:1
  • 6Hindman B, Konwinski A, Zaharia M, et al. Mesos : A platform for fine - grained resource sharing in the data center[ C ]. Proceedings of the 8th USENIX con- ference on Networked systems design and implementa-tion. 2011 22 -22. 被引量:1
  • 7Ghodsi A, Zaharia M, Hindman B, et al. Dominant resource fairness: fair allocation of multiple resource types [ C ]. USENIX NSDI. (2011 ). 被引量:1
  • 8Parkes D C, Procaccia A D, Shah N. Beyond domi- nant resource fairness : extensions, limitations, and indivisihilities[ C]. Proceedings of the 13th ACM Con- ference on Electronic Commerce. ACM, 2012 : 808 - 825. 被引量:1
  • 9Zaharia M, Borthakur D, Sen Sarma ], et al. Delay scheduling: a simple technique for achieving localityand fairness in cluster scheduling [ C ]. Proceedings of the 5th European conference on Computer systems. ACM, 2010:265-278. 被引量:1
  • 10Foley D K. Resource allocation and the public sector [ J ]. YALE ECON ESSAYS, VOL 7, NO 1, PP 45 - 98, SPRING 1967. 7 FIG, 13 REF., 1967. 被引量:1

引证文献9

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部