基于分层调度的实时系统容错技术研究

发布时间：2018-10-23 14:20

【摘要】：近年来，实时系统被广泛地应用到安全关键的汽车电子领域。除保证实时应用输出结果的逻辑正确性外，还必须具有严格的时间确定性和高度的可靠性，否则会导致严重后果。然而随着应用需求的扩展，当前实时系统应用软件朝着大规模、高复杂度的方向发展，其安全性与可靠性问题变得日益突出。如何利用容错技术，，使实时系统在出错后行为可控是十分必要的。本文对分层实时调度框架进行了研究，详细分析实时系统中广泛采用的基于任务复制的容错算法，归纳出其中存在的问题：1)已有系统可靠性模型假设系统某一时刻只有一个故障，且在下个故障来时上个故障已解决，假设较为理想导致实用性不强；2)盲目地使每个任务拥有ε+1个副版容忍可能的ε个故障，虽然提高可靠性但易造成系统冗余度过高，可能引发任务因竞争计算资源而错过截止期。针对上述问题，本文将借助分层调度框架提供的基于组件的设计和分析方法以及主动任务复制容错技术，以满足系统可靠性目标和最小化冗余资源为目的，在多处理器平台上展开实时系统容错理论和算法研究。首先，提出了一种同构系统周期任务复制容错算法。在建模阶段，以一个超周期作为量化标准，从总体上通过使用概率统计的方法分析系统可靠性，实现了对基于周期任务集的系统可靠性模型建模。在精确量化分析阶段，依据系统可靠性目标和对系统可靠性模型的分析，给出了任务复制次数边界值的计算方法，在一定程度上避免对任务的盲目复制。接着，根据任务对系统可靠性贡献和占用系统计算资源情况的差异，提出了一种经济的任务复制策略，从而在尽量减少因任务复制而增加计算资源消耗的条件下，达到动态确立各个任务的复制个数。其次，提出了一种异构系统DAG任务复制容错算法。针对建模阶段，通过对DAG图中任务依赖关系分析，建立了单DAG的可靠性模型，并在其基础上，实现了对多DAG系统的可靠性建模。在量化分析阶段，借助已建立的可靠性模型和每次选择可靠性代价最小处理机的思想，提出了一种任务复制次数下限值算法。然后采用上述经济的任务复制策略，在系统可靠性目标的驱使下动态地量化各个任务需要冗余复制次数，并在多处理机平台上为任务分配处理器调度。仿真实验结果表明，与以往盲目的任务复制容错方法相比，以上算法能够从整体上达到系统的可靠性目标并最小化占用的冗余计算资源。
[Abstract]:In recent years, real-time systems have been widely used in the field of safety-critical automotive electronics. In addition to ensuring the logical correctness of the output results of real-time applications, strict time certainty and high reliability are also required, otherwise it will lead to serious consequences. However, with the expansion of application requirements, the application software of real-time systems is developing towards the direction of large-scale and high complexity, and the security and reliability problems become more and more prominent. It is necessary to make use of fault-tolerant technology to control the behavior of real-time system after making mistakes. In this paper, the hierarchical real-time scheduling framework is studied, and the widely used fault-tolerant algorithm based on task replication in real-time system is analyzed in detail. The problems are summarized as follows: 1) the existing system reliability models assume that there is only one fault in the system at a certain time, and the last fault has been solved when the next fault comes. 2) blindly make each task have 蔚 1 side edition to tolerate possible 蔚 faults, although improve reliability, it is easy to cause system redundancy to be too high, which may lead to task missing deadline due to competing computing resources. Aiming at the above problems, this paper will use the component-based design and analysis method provided by the hierarchical scheduling framework and the active task replication fault-tolerant technology to meet the system reliability goals and minimize redundant resources. The fault tolerant theory and algorithm of real-time system are studied on multi-processor platform. Firstly, a fault-tolerant algorithm for periodic task replication in isomorphic systems is proposed. In the stage of modeling, the system reliability model based on periodic task set is established by using the method of probability and statistics to analyze the reliability of the system with a super-period as the quantification standard. In the stage of accurate quantitative analysis, according to the reliability target of the system and the analysis of the reliability model of the system, the calculation method of the boundary value of the task replication times is given, which avoids the blind duplication of the task to a certain extent. Then, according to the difference of the contribution of the task to the system reliability and the difference of occupying the computing resources of the system, a kind of economic task replication strategy is proposed, which can reduce the consumption of computing resources because of the task replication as far as possible. To dynamically determine the number of copies of each task. Secondly, a fault-tolerant algorithm for DAG task replication in heterogeneous systems is proposed. In the stage of modeling, the reliability model of single DAG is established by analyzing the task-dependent relation in DAG diagram, and on the basis of it, the reliability modeling of multi-DAG system is realized. In the phase of quantitative analysis, with the help of the established reliability model and the idea of minimum cost processor for each selection of reliability, a lower limit algorithm for the number of task replicates is proposed. Then the economic task replication strategy mentioned above is adopted to dynamically quantify the number of redundant replicas required for each task under the drive of system reliability objectives and to assign processors for task scheduling on multiprocessor platforms. Simulation results show that compared with the previous blind task replication fault-tolerant methods, the above algorithm can achieve the reliability goal of the system as a whole and minimize the redundant computing resources.
【学位授予单位】：湖南大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP302.8

【参考文献】