摘要
软错误是高辐照空间环境下影响计算可靠性的主要因素,结果错误(silent data corruption,简称SDC)是软错误造成的一种特殊的故障类型.针对SDC难以检测的问题,提出了一种基于不变量的检测方法.不变量是运行时刻保持不变的程序特征.在软错误发生后,由于程序受到影响,不变量一般不再满足.根据该原理,在源代码中插入以不变量为内容的断言,利用发生软错误后断言报错来检测软错误.首先,根据错误传播分析确定了检测位置,提取了检测位置的不变量;定义了表征不变量检测能力的渗透率,在同一检测位置依据渗透率将不变量转化为断言.通过错误注入实验,验证了该检测方法的有效性.实验结果表明:该检测方法具备较高的检出率和较低的检测代价,为星载系统的软错误防护提供了新的解决思路.
Soft error has a great influence on computing reliability of space devices and could result in silent data corruption (SDC), which means wrong outcomes of a program without any crash detected. As SDC-causing fault always propagates silently, it is very difficult to detect SDC. In this paper, an approach for detecting SDC is proposed by using program invariant. A program invariant is a set of properties of program. Normally, the invariant holds during runtime. But when soft error occurs, the invariant is often violated due to the impact of soft error. Based on this principle, invariant-based asserts are inserted into source code. Once an exception is thrown by an assert, it indicates that soft error is detected. By analyzing the propagation of the fault that leads to SDC, the locations where asserts are embedded are selected and then invariants are extracted. Some of the invariants are converted to asserts based on their permeability, which indicates the capabilities of detecting soft error. The proposed approach is evaluated by fault injection experiment which shows that it achieves high coverage with low overhead. The approach broadens the ways of protecting satellite system from soft error.
出处
《软件学报》
EI
CSCD
北大核心
2016年第2期219-230,共12页
Journal of Software
关键词
单粒子翻转
结果错误
错误检测
不变量
single event upset
silent data corruption
error detection
program invariant