In the context of real-time fault-tolerant scheduling in multiprocessor systems, Primary-backup scheme plays an important role. A backup copy is always preferred to be executed as passive backup copy whenever possible...In the context of real-time fault-tolerant scheduling in multiprocessor systems, Primary-backup scheme plays an important role. A backup copy is always preferred to be executed as passive backup copy whenever possible because it can take the advantages of backup copy de-allocation technique and overloading technique to improve schedulability. In this paper, we propose a novel efficient fault-tolerant ratemonotonic best-fit algorithm efficient fault-tolerant rate-monotonic best-fit (ERMBF) based on multiprocessors systems to enhance the schedulability. Unlike existing scheduling algorithms that start scheduling tasks with only one processor. ERMBF pre-allocates a certain amount of processors before starting scheduling tasks, which enlarge the searching spaces for tasks. Besides, when a new processor is allocated, we reassign the task copies that have already been assigned to the existing processors in order to find a superior tasks assignment configuration. These two strategies are all aiming at making as many backup copies as possible to be executed as passive status. As a result, ERMBF can use fewer processors to schedule a set of tasks without losing real-time and fault-tolerant capabilities of the system. Simulation results reveal that ERMBF significantly improves the schedulability over existing, comparable algorithms in literature.展开更多
容错是硬实时系统的关键能力,容错调度算法可以在有错误发生的情况下满足任务的实时性需求.在主副版本机制的容错调度算法中,主版本出错后留给副版本运行的时间窗口小,副版本容易错失截止期.针对副版本需要快速响应的问题,提出副版本不...容错是硬实时系统的关键能力,容错调度算法可以在有错误发生的情况下满足任务的实时性需求.在主副版本机制的容错调度算法中,主版本出错后留给副版本运行的时间窗口小,副版本容易错失截止期.针对副版本需要快速响应的问题,提出副版本不可抢占的全局容错调度算法FTGS-NPB(fault-tolerant global scheduling with non-preemptive backups),赋予副版本全局最高优先级,使副版本在主版本出错后可以立刻获得处理器资源,并且在运行过程中不会被其他任务抢占.这样,副版本可以在最短时间内响应.分别基于截止期分析和响应时间分析建立了FTGS-NPB的可调度性测试,并分析了两种可调度性测试分别适用于不同的优先级分配算法.仿真实验结果表明,FTGS-NPB可以有效地减少实现容错的代价.展开更多
针对多处理器硬实时系统运行过程中任务出错问题,提出一种基于主副版本策略的固定优先级全局容错调度算法FTGS-BD(fault tolerant global scheduling with backup delay)。该算法使用主动副版本和被动副版本,在保证实时性的前提下根据...针对多处理器硬实时系统运行过程中任务出错问题,提出一种基于主副版本策略的固定优先级全局容错调度算法FTGS-BD(fault tolerant global scheduling with backup delay)。该算法使用主动副版本和被动副版本,在保证实时性的前提下根据任务需求和硬件性能尽可能的推迟主动副版本的运行,并在不需要副版本响应时回收分配给副版本的资源,从而减少实现容错所需的代价。仿真结果表明,和仅使用被动副版本的全局容错调度算法相比,在调度相同的任务集时,FTGS-BD最多可以减少20%的处理器资源需求,平均减少12%。FTGS-BD能够应用于主副版本总使用率最大值大于1的任务集。展开更多
As a newly emerging computing paradigm, edge computing shows great capability in supporting and boosting 5G and Internet-of-Things (IoT) oriented applications, e.g., scientific workflows with low-latency, elastic, and...As a newly emerging computing paradigm, edge computing shows great capability in supporting and boosting 5G and Internet-of-Things (IoT) oriented applications, e.g., scientific workflows with low-latency, elastic, and on-demand provisioning of computational resources. However, the geographically distributed IoT resources are usually interconnected with each other through unreliable communications and ever-changing contexts, which brings in strong heterogeneity, potential vulnerability, and instability of computing infrastructures at different levels. It thus remains a challenge to enforce high fault-tolerance of edge-IoT scientific computing task flows, especially when the supporting computing infrastructures are deployed in a collaborative, distributed, and dynamic environment that is prone to faults and failures. This work proposes a novel fault-tolerant scheduling approach for edge-IoT collaborative workflows. The proposed approach first conducts a dependency-based task allocation analysis, then leverages a Primary-Backup (PB) strategy for tolerating task failures that occur at edge nodes, and finally designs a deep Q-learning algorithm for identifying the near-optimal workflow task scheduling scheme. We conduct extensive simulative case studies on multiple randomly-generated workflow and real-world edge-IoT server position datasets. Results clearly suggest that our proposed method outperforms the state-of-the-art competitors in terms of task completion ratio, server active time, and resource utilization.展开更多
主/副版本备份技术是分布式系统常用的实时容错调度方法,然而传统的主动方式副版本即使在无处理机故障时也需要在备份处理机上完全运行,增加了处理机消耗.提出一种基于固定优先级调度算法的延迟主动副版本(deferred active backup-copy...主/副版本备份技术是分布式系统常用的实时容错调度方法,然而传统的主动方式副版本即使在无处理机故障时也需要在备份处理机上完全运行,增加了处理机消耗.提出一种基于固定优先级调度算法的延迟主动副版本(deferred active backup-copy)备份技术,该技术通过尽量向后调度主动方式的副版本,并在主版本成功执行时终止副版本的执行来减少备份的冗余度.在此基础上,提出一种基于该技术的以最小化处理机数目为优化目标的启发式任务分配算法——基于延迟主动副版本的最佳适应算法DABCBF(deferred active backup-copy based best-fit algorithm).DABCBF在保证系统的实时性和容错能力的前提条件下,通过尽量减少主版本的最坏响应时间来最大程度地减少冗余,以节省处理机.最后通过仿真实验,证明了算法的可行性和有效性.展开更多
基金Supported by the National Basic Reseach Program of China (973 Program 2004 CB318200)
文摘In the context of real-time fault-tolerant scheduling in multiprocessor systems, Primary-backup scheme plays an important role. A backup copy is always preferred to be executed as passive backup copy whenever possible because it can take the advantages of backup copy de-allocation technique and overloading technique to improve schedulability. In this paper, we propose a novel efficient fault-tolerant ratemonotonic best-fit algorithm efficient fault-tolerant rate-monotonic best-fit (ERMBF) based on multiprocessors systems to enhance the schedulability. Unlike existing scheduling algorithms that start scheduling tasks with only one processor. ERMBF pre-allocates a certain amount of processors before starting scheduling tasks, which enlarge the searching spaces for tasks. Besides, when a new processor is allocated, we reassign the task copies that have already been assigned to the existing processors in order to find a superior tasks assignment configuration. These two strategies are all aiming at making as many backup copies as possible to be executed as passive status. As a result, ERMBF can use fewer processors to schedule a set of tasks without losing real-time and fault-tolerant capabilities of the system. Simulation results reveal that ERMBF significantly improves the schedulability over existing, comparable algorithms in literature.
文摘容错是硬实时系统的关键能力,容错调度算法可以在有错误发生的情况下满足任务的实时性需求.在主副版本机制的容错调度算法中,主版本出错后留给副版本运行的时间窗口小,副版本容易错失截止期.针对副版本需要快速响应的问题,提出副版本不可抢占的全局容错调度算法FTGS-NPB(fault-tolerant global scheduling with non-preemptive backups),赋予副版本全局最高优先级,使副版本在主版本出错后可以立刻获得处理器资源,并且在运行过程中不会被其他任务抢占.这样,副版本可以在最短时间内响应.分别基于截止期分析和响应时间分析建立了FTGS-NPB的可调度性测试,并分析了两种可调度性测试分别适用于不同的优先级分配算法.仿真实验结果表明,FTGS-NPB可以有效地减少实现容错的代价.
文摘针对多处理器硬实时系统运行过程中任务出错问题,提出一种基于主副版本策略的固定优先级全局容错调度算法FTGS-BD(fault tolerant global scheduling with backup delay)。该算法使用主动副版本和被动副版本,在保证实时性的前提下根据任务需求和硬件性能尽可能的推迟主动副版本的运行,并在不需要副版本响应时回收分配给副版本的资源,从而减少实现容错所需的代价。仿真结果表明,和仅使用被动副版本的全局容错调度算法相比,在调度相同的任务集时,FTGS-BD最多可以减少20%的处理器资源需求,平均减少12%。FTGS-BD能够应用于主副版本总使用率最大值大于1的任务集。
基金supported National Key R&D Program of China with Grant number 2018YFB1403602Chongqing Technological innovation foundations with Grant numbers cstc2019jscx-msxm0652 and cstc2019jscx-fxyd0385+3 种基金Chongqing Key RD project with Grant number cstc2018jszx-cyzdX0081Jiangxi Key RD project with Grant number 2018ACE50029Sponsored by technological program organized by SGCC(No.52094020000U)Technology Innovation and Application Development Foundation of Chongqing under Grant cstc2020jscx-gksbX0010.
文摘As a newly emerging computing paradigm, edge computing shows great capability in supporting and boosting 5G and Internet-of-Things (IoT) oriented applications, e.g., scientific workflows with low-latency, elastic, and on-demand provisioning of computational resources. However, the geographically distributed IoT resources are usually interconnected with each other through unreliable communications and ever-changing contexts, which brings in strong heterogeneity, potential vulnerability, and instability of computing infrastructures at different levels. It thus remains a challenge to enforce high fault-tolerance of edge-IoT scientific computing task flows, especially when the supporting computing infrastructures are deployed in a collaborative, distributed, and dynamic environment that is prone to faults and failures. This work proposes a novel fault-tolerant scheduling approach for edge-IoT collaborative workflows. The proposed approach first conducts a dependency-based task allocation analysis, then leverages a Primary-Backup (PB) strategy for tolerating task failures that occur at edge nodes, and finally designs a deep Q-learning algorithm for identifying the near-optimal workflow task scheduling scheme. We conduct extensive simulative case studies on multiple randomly-generated workflow and real-world edge-IoT server position datasets. Results clearly suggest that our proposed method outperforms the state-of-the-art competitors in terms of task completion ratio, server active time, and resource utilization.
文摘主/副版本备份技术是分布式系统常用的实时容错调度方法,然而传统的主动方式副版本即使在无处理机故障时也需要在备份处理机上完全运行,增加了处理机消耗.提出一种基于固定优先级调度算法的延迟主动副版本(deferred active backup-copy)备份技术,该技术通过尽量向后调度主动方式的副版本,并在主版本成功执行时终止副版本的执行来减少备份的冗余度.在此基础上,提出一种基于该技术的以最小化处理机数目为优化目标的启发式任务分配算法——基于延迟主动副版本的最佳适应算法DABCBF(deferred active backup-copy based best-fit algorithm).DABCBF在保证系统的实时性和容错能力的前提条件下,通过尽量减少主版本的最坏响应时间来最大程度地减少冗余,以节省处理机.最后通过仿真实验,证明了算法的可行性和有效性.