摘要
容错设计是提高计算机系统可靠性的有效手段。本文提出了一种分布共享主存的并行计算机系统的容错结构,着重分析了结构采用的故障诊断机制,提出了系统中备份节点机配置的优化策略。
Fault-tolerant design is an effective method to improve the reliability of computer systems. This paper proposes a fault-tolerant architecture for parallel computer systems, analyzes the fault diagnosis techniques employed in this architecture and presents the configuration optimization strategy for stand-by nodes.
出处
《计算机工程与科学》
CSCD
2005年第9期69-70,84,共3页
Computer Engineering & Science
关键词
并行计算机系统
容错
可靠性
故障诊断
parallel computer system
fault-tolerance
reliability
fault diagnosis