基于检查点机制的系统性能优化技术研究
发布时间:2018-12-28 18:56
【摘要】:当今社会,计算机系统被广泛应用在交通运输、医学、航海、航空等各个领域,人们对计算机可靠性提出了越来越高的要求。事实上,软硬件本身的特性决定了系统完全不发生失效是不可能的。考虑一个需要长时间运行的任务,在执行过程中,如果发生故障,那么不得不重新开始执行,这就造成了不必要的浪费。因此,能够在故障发生时容忍故障就显得尤为重要了。检查点技术就是这样一种有效的容错手段,被广泛用在计算机、数据库系统中,旨在提高系统可靠性。通过在任务运行过程中每隔一段时间设置检查点,避免失效发生时,大量的计算内容被丢失,改善系统的性能。 针对1层恢复方案检查点设置开销较大的问题,Vaidya提出了所谓的2层恢复方案,旨在降低在任务运行过程中的检查点设置开销。在2层恢复方案中,存在设置开销不同的2种类型的检查点,即N-checkpoint和local checkpoint,分别被保存在远端存储器和本地磁盘中。设置一个local checkpoint的花销要低于设置N-checkpoint的开销。为了实现优化的性能,Vaidya通过数值搜索给出了指数失效分布下的检查点放置策略。 本文提出一种新的2层检查点放置策略,确定整个系统运行过程中localcheckpoint和N-checkpoint放置。该放置策略不仅适用于故障分布服从指数分布的情形,也能适用于更复杂的分布类型,如weibull分布。结果表明,本文给出的策略能获得较好的性能。同时,本文分析了影响相邻N-checkpoint之间最优localcheckpoint数目的因素。结果表明2种类型检查点的设置开销之比和2种失效发生的概率比是影响其的因素。
[Abstract]:Nowadays, computer system is widely used in transportation, medicine, navigation, aviation and so on. In fact, the characteristics of the software and hardware itself make it impossible for the system to fail completely. Consider a task that needs to run for a long time. In the course of execution, if there is a failure, then we have to start execution again, which will cause unnecessary waste. Therefore, it is particularly important to be able to tolerate faults when they occur. Checkpoint technology is such an effective fault-tolerant method, widely used in computer, database systems, aimed at improving system reliability. In order to avoid the loss of a large amount of computing content and improve the performance of the system, the checkpoint is set every once in a while during the operation of the task to avoid the loss of a large amount of computing content when the failure occurs. Aiming at the problem of high overhead of checkpoint setting in layer 1 recovery scheme, Vaidya proposes a so-called two-layer recovery scheme, which aims to reduce the overhead of checkpoint setting in the course of task running. In the two-layer recovery scheme, there are two types of checkpoints with different setup overhead, that is, N-checkpoint and local checkpoint, are stored in remote memory and local disk, respectively. Setting up a local checkpoint costs less than setting up a N-checkpoint. In order to achieve optimal performance, Vaidya gives a checkpoint placement strategy under exponential failure distribution by numerical search. In this paper, a new two-layer checkpoint placement strategy is proposed to determine the placement of localcheckpoint and N-checkpoint in the whole system. The placement strategy is not only suitable for fault distribution with exponential distribution, but also for more complex distribution types, such as weibull distribution. The results show that the proposed strategy can achieve better performance. At the same time, the factors influencing the optimal number of localcheckpoint between adjacent N-checkpoint are analyzed. The results show that the ratio of setting overhead of two types of checkpoints and the probability ratio of two kinds of failure are the factors affecting them.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP302.8
本文编号:2394304
[Abstract]:Nowadays, computer system is widely used in transportation, medicine, navigation, aviation and so on. In fact, the characteristics of the software and hardware itself make it impossible for the system to fail completely. Consider a task that needs to run for a long time. In the course of execution, if there is a failure, then we have to start execution again, which will cause unnecessary waste. Therefore, it is particularly important to be able to tolerate faults when they occur. Checkpoint technology is such an effective fault-tolerant method, widely used in computer, database systems, aimed at improving system reliability. In order to avoid the loss of a large amount of computing content and improve the performance of the system, the checkpoint is set every once in a while during the operation of the task to avoid the loss of a large amount of computing content when the failure occurs. Aiming at the problem of high overhead of checkpoint setting in layer 1 recovery scheme, Vaidya proposes a so-called two-layer recovery scheme, which aims to reduce the overhead of checkpoint setting in the course of task running. In the two-layer recovery scheme, there are two types of checkpoints with different setup overhead, that is, N-checkpoint and local checkpoint, are stored in remote memory and local disk, respectively. Setting up a local checkpoint costs less than setting up a N-checkpoint. In order to achieve optimal performance, Vaidya gives a checkpoint placement strategy under exponential failure distribution by numerical search. In this paper, a new two-layer checkpoint placement strategy is proposed to determine the placement of localcheckpoint and N-checkpoint in the whole system. The placement strategy is not only suitable for fault distribution with exponential distribution, but also for more complex distribution types, such as weibull distribution. The results show that the proposed strategy can achieve better performance. At the same time, the factors influencing the optimal number of localcheckpoint between adjacent N-checkpoint are analyzed. The results show that the ratio of setting overhead of two types of checkpoints and the probability ratio of two kinds of failure are the factors affecting them.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP302.8
【参考文献】
中国期刊全文数据库 前4条
1 李凯原,杨孝宗;减少检查点开销的一种方法[J];计算机工程与应用;2000年02期
2 魏晓辉,鞠九滨;分布式系统中的检查点算法[J];计算机学报;1998年04期
3 范新媛,徐国治,应忍冬;基于检查点和Rejuvenation的软件可靠性建模分析[J];系统仿真学报;2003年11期
4 印杰;江建慧;;复杂失效分布下的动态检查点设置[J];小型微型计算机系统;2010年04期
,本文编号:2394304
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2394304.html