摘要
检查点是并行系统中实现容错的重要手段,同步检查点方法已广泛应用在工作站机群系统中。PVM所提供的消息传递机制支持高效的异构网络计算,但不支持容错功能。为了降低同步检查点设置的时间开销,提出了一种基于PVM的准同步检查点设置方法,它吸取了同步检查点方法的优点,又通过消息记录方式实现各节点间独立进行状态保存,大大降低了检查点的同步开销,提高了检查点操作效率,该方法在PVM环境下得以实现,实验结果表明所提出的方法具有较好的容错性能。
Checkpoint is an important means to implement fault-tolerance in parallel system. Synchronous checkpointing method has been widely used in network of workstation system. Message-passing mechanism, provided by PVM, has high efficiency in heterogeneous network computing, while lacks of supporting fault-tolerance. In order to reduce time overhead, a method for PVM-based quasi-synchronous checkpointing was given. This method adopted the advantages of synchronous checkpointing method, and enabled each node to save status independently by recording message. Thereby, overhead of synchronization of checkpointing was reduced greatly, and operation efficiency of checkpoint was enhanced. This method was implemented in PVM environment. Results of experiments showed that the method had better performance of fault-tolerance.
出处
《计算机工程与设计》
CSCD
北大核心
2006年第3期494-496,共3页
Computer Engineering and Design
关键词
检查点
准同步
消息
checkpoint
quasi-synchronization
message