期刊文献+

提高用任务重复的检查点方案的性能 被引量:4

Improving the Performance of Checkpointing Scheme with Task Duplication
下载PDF
导出
摘要 设置检查点是减少程序在故障条件下执行时间的一种常用技术 .将检查点与任务重复技术相结合 ,不仅能够完成有效的故障恢复 ,而且还能进行完善的故障检测 .上述系统的开销主要来自两方面 :其一是每个检查点的比较和保存开销 ,其二是因故障而引起的卷回 .本文利用增量检查点对Ziv和Bruck提出的方法进行了改进 ,改进后的方法不仅能够有效地减少比较、保存检查点的开销 ,而且还能够避免潜伏故障引起的卷回 .分析表明改进后的方法与Ziv和Bruck的方法相比表现出更好的性能 . Checkpointing is a common technique for reducing the execution time of programs under fault assumption.With the combination of checkpointing and task duplication,not only effective fault recovery but also perfect fault detection can be achieved.The overhead of such systems comes from two aspects:comparing and saving operations at each checkpoint,and the rollbacks caused by faults.This paper improves the method presented by Ziv and Bruck by employing incremental checkpointing.The improved method can reduce the overhead of comparing and saving operation,and moreover the rollbacks caused by latent faults can be avoided.Analysis shows that our method exhibits better performance by comparison with that of Ziv and Bruck.
出处 《电子学报》 EI CAS CSCD 北大核心 2000年第5期33-35,28,共4页 Acta Electronica Sinica
基金 国家自然科学基金!(No.698730 1 3)
关键词 容错 检查点 卷回恢复 任务重复 程序 fault tolerance checkpoint rollback recovery task duplication
  • 相关文献

参考文献9

  • 1[1] A.Ziv and J.Bruck.Performance optimization of checkpointing schemes with task duplication.IEEE Trans.Computers,Dec.1997,46(12):1381~1386 被引量:1
  • 2[2] A.Ziv and J.Bruck.Analysis of checkpointing schemes with task duplication.IEEE Trans.Computers,Feb.1998,37(2):222~227 被引量:1
  • 3[3] D.P.Siewiorek and R.S.Swarz.The theory and practice of reliable system design.Digital Press,1982 被引量:1
  • 4[4] P.Agrawal.Fault tolerance in multiprocessor systems without dedicated redundency.IEEE Trans.Computers,Mar.1988,37(3):358~362 被引量:1
  • 5[5] A.Duda.The effects of checkpointing on program execution time.Information Processing Letters,June 1983,16:221~229 被引量:1
  • 6[6] D.K.Pradhan,and N.H.Vaidya.Roll-Forward and Rollback Recovery:Performance-Reliability Trade-off.Proc.24th IEEE Int′l Symp.Fault-Tolerant Computing,June 1994:186~195 被引量:1
  • 7[7] J.Long,W.K.Fuchs,and J.A.Abraham.Forward recovery using checkpointing in parallel systems.Proc.19th Int′l Conf.Parallel Processing,Aug.1990:272~275 被引量:1
  • 8[8] E.N.Elnozahy,D.B.Johnson and W.Zwaenepoel.The performance of consistent checkpinting.The 11th Symposium on Reliable Distributed Systems,1992:39~47 被引量:1
  • 9[9] J.S.Plank,M.Beck and G.Kingsley.Libckpt:Transparent Checkpointing under Unix.1995 USENIX Technical Conference,1995:213~223 被引量:1

同被引文献60

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部