期刊文献+

基于网格的分布式仿真系统容错机制 被引量:3

The Fault-tolerance Mechanism in Grid-based Distributed Simulation System
下载PDF
导出
摘要 针对分布式仿真的需求,在网格的基础上构建了通用的分布式仿真容错系统。该系统由三部分组成:仿真资源状态监控模块、数据保存模块及错误恢复模块。其中仿真资源状态监控基于网格的MDS实现;数据保存(包括进程空间、进程间交互关系的保存)及错误恢复基于检查点机制在用户空间实现。就所增加的容错机制跟仿真系统原有功能模块的关系进行了分析。最后,基于网格及上述容错模块设计并实现了一个C/S模式的容错代理,用来实现仿真系统的自动容错。 Aiming at the demand of the distributed simulation system, this paper has built a common grid-based fault tolerance system. The system consists of three parts: simulation resource monitoring module, data saving module, and error recovery module. The implementation of monitoring module is built on top of grid's MDS, while data saving module, including the saving of the process space and the iterative relationship between processes, and fault recovery are realized based on checkpoint mechanism in the user space. In addition, we analyze the relationship between these three modules and the existing function modules in simulation system. In the end, we design and implement a fault tolerance broker in Client/Sever mode to automate the fault tolerance.
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2005年第1期35-38,共4页 Journal of National University of Defense Technology
基金 国家部委基金资助项目(51404010403KG0155)
关键词 HLA 容错 网格 HLA fault-tolerance grid
  • 相关文献

参考文献9

  • 1金士尧 马民.HLA分布式仿真中容错机制研究[A]..第十届全国容错计算学术会议[C].,2003.. 被引量:2
  • 2Dahmann J S. The High Level Architecture and Beyond:Technology Challenges[A]. In Proceedings of 13^th Workshop on Parallel and Distributed Simulation[C], 1999. 被引量:1
  • 3Kiesling T. Fault-tolerant Distributed Simulation: A Position Paper[R].2003. 被引量:1
  • 4Lüthi J,Berchtold C. Concepts for Dependable Distributed Discrete Event Simulation[A]. In Proceedings of the International European Simulation Multi-conference[C], 2000. 被引量:1
  • 5刘鹏.网格计算[M].北京:清华大学出版社,2003.. 被引量:1
  • 6Elnozahy M, Alvisi L, Wang Y M, et al. A Survey of Rollback-recovery Protocols in Message-passing Systems[R]. Technical Report CMU-CS-99-148, School of Computer Science, Carnegie Mellon University, June 1999. 被引量:1
  • 7Milojicic D, Douglis F, Zhou S,et al. Process Migration[R]. HP Labs, AT&T Labs-Research, TOG Research Institute, EMC, and University of Toronto and Platform Computing,Feb,1999. 被引量:1
  • 8Stelling P, Foster I, Kesselman C. A Fault Detection Service for Wide Area Distributed Computations[J]. 0-8186-8579-4/98, IEEE,1998. 被引量:1
  • 9GoerzenJ.Linux编程宝典[M].北京:电子工业出版社,2000.. 被引量:1

共引文献1

同被引文献12

  • 1李良杰,田金兰,胡美枝,刘琳.CGSV中的资源监控模型[J].计算机工程与应用,2006,42(18):146-149. 被引量:3
  • 2Zong Wenbo, Wang Yong, Cai Wentong, et al. Grid Services and Service Discovery for HLA - Based Distributed Simulation [ C ]. Proceedings of The Eighth IEEE International Symposium on Distributed Simulation And Real - Time Applications( DS - RT04) ,2004. 被引量:1
  • 3Zajac K, Tirado - Ramos A, Zhao Z, et al. Grid Services for HLA -based Distributed Simulation Frameworks [ C ]. In: I st European Across Grids Conference, Spain, Feb. , 2003 : 147 - 154. 被引量:1
  • 4H Jin. ChinaGrid:Making Grid Computing a Reality[ C]. In: International Conference of Asian Digital Libraries. Springer, Shanghai, China, Dec. 2004 : 13 - 24. 被引量:1
  • 5Foster I, Kesselman C. The Anatomy of the Grid [ J ]. International Journal of High Performance Computing Applications,2001,15 (3) :200 - 202. 被引量:1
  • 6[3]张传富,等.基于网格仿真的研究与认识[C]//05仿真会议论文集,2005:300-303. 被引量:1
  • 7Mootaz Elnozahy,Lorenzo Alvisi,Yi-Min Wang,David B.Johnson.A Survey of Rollback-Recovery Protocols in Message-Passing Systems [R].Technical report CMU-CS-99-148,School of Computer Science,Carnegie Mellon University,1999. 被引量:1
  • 8Dejan Milojicic,Fred Douglis,Songnian Zhou,etc.Process Migration [R].HP Labs,AT&T Labs-Research,TOG Research Institute,EMC,and University of Toronto and Platform Computing.1999. 被引量:1
  • 9Victor C Zandy,etc.Reliable Network Connections [DB/OL].Computer Sciences Department,University of Wisconsin-Madison. 被引量:1
  • 10JohnGoerzen.Linux Programming Bible [M].北京:电子工业出版社,2000.. 被引量:1

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部