摘要
设置检查点是减少程序在故障条件下执行时间的一种常用技术 .将检查点与任务重复技术相结合 ,不仅能够完成有效的故障恢复 ,而且还能进行完善的故障检测 .上述系统的开销主要来自两方面 :其一是每个检查点的比较和保存开销 ,其二是因故障而引起的卷回 .本文利用增量检查点对Ziv和Bruck提出的方法进行了改进 ,改进后的方法不仅能够有效地减少比较、保存检查点的开销 ,而且还能够避免潜伏故障引起的卷回 .分析表明改进后的方法与Ziv和Bruck的方法相比表现出更好的性能 .
Checkpointing is a common technique for reducing the execution time of programs under fault assumption.With the combination of checkpointing and task duplication,not only effective fault recovery but also perfect fault detection can be achieved.The overhead of such systems comes from two aspects:comparing and saving operations at each checkpoint,and the rollbacks caused by faults.This paper improves the method presented by Ziv and Bruck by employing incremental checkpointing.The improved method can reduce the overhead of comparing and saving operation,and moreover the rollbacks caused by latent faults can be avoided.Analysis shows that our method exhibits better performance by comparison with that of Ziv and Bruck.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2000年第5期33-35,28,共4页
Acta Electronica Sinica
基金
国家自然科学基金!(No.698730 1 3)
关键词
容错
检查点
卷回恢复
任务重复
程序
fault tolerance
checkpoint
rollback recovery
task duplication