当前位置:主页 > 科技论文 > 计算机论文 >

面向多种处理器RAS机制的故障注入工具设计与实现

发布时间:2018-05-29 13:11

  本文选题:可靠性 + 可用性 ; 参考:《哈尔滨工业大学》2012年硕士论文


【摘要】:面向关键业务的服务器系统不仅需要拥有强大的业务处理能力,同时也应具备较强的容错能力。当前,各计算机厂商为了提高系统的容错能力及可用性,,设计并实现了多种多样的RAS机制。结合RAS机制评估的系统容错能力可以为厂商提供有针对性的改进意见,而使用故障注入的方法无疑会使评测效率大幅提高。另外,对现有的服务器系统的容错能力使用统一测试用例进行横向对比,也能够发现不同架构的容错机制具有的优势与缺陷。 本文首先参照FARM模型,从错误产生的原因出发,划分错误的严重等级,建立了面向RAS机制的故障集,为多种平台容错能力的评测构建了故障注入模型。然后,按照驱动层模拟和基于可测试性接口两种故障注入思想,设计并实现了CPU故障注入工具与寄存器故障注入工具。故障注入工具可用于x86、ia64和sparc三种架构及其相应的Linux与Solaris两种操作系统,具备了向cache、TLB等内部部件及寄存器、针对系统关键进程和应用程序进程植入数据内容一位或多位错误的能力。 为了验证故障注入工具的有效性,本文选择了四种不同的计算机系统进行cache和寄存器故障注入实验。实验结果证实了本文设计并实现的故障注入工具的有效性,能够使用相同的测试用例对四种系统进行测试,并将所得结果进行横向对比,对系统的容错能力进行定性评价。本文在实验中分析了MCA机制与预测性恢复机制各自在错误处理方面的优势与不足,通过对不同服务器系统进行寄存器故障注入测试,得出提高容错能力需要硬件搭载合适操作系统的结论。
[Abstract]:The server system for critical business needs not only strong service processing ability, but also strong fault tolerance ability. At present, in order to improve the fault tolerance and availability of the system, computer manufacturers design and implement a variety of RAS mechanisms. The fault-tolerant ability of the system evaluated with RAS mechanism can provide some suggestions for the manufacturers to improve the system fault tolerance, and the evaluation efficiency will be greatly improved by using the method of fault injection. In addition, the advantages and disadvantages of the fault-tolerant mechanism of different architectures can be found by comparing the fault tolerance ability of the existing server system with the unified test cases. In this paper, the fault set oriented to the RAS mechanism is established according to the FARM model and the cause of the error, and the fault injection model is constructed for the evaluation of the fault tolerance ability of various platforms. Then, according to the idea of driver layer simulation and fault injection based on testability interface, CPU fault injection tool and register fault injection tool are designed and implemented. The fault injection tool can be used in three kinds of architectures, x86nia64 and sparc, and the corresponding Linux and Solaris operating systems, with internal components and registers, such as cache-TLB, etc. The ability to populate one or more bit errors of data content for system critical processes and application processes. In order to verify the effectiveness of fault injection tools, four different computer systems are selected for cache and register fault injection experiments. The experimental results show the effectiveness of the fault injection tool designed and implemented in this paper. The four systems can be tested with the same test cases. The results obtained are compared horizontally and the fault tolerance of the system is evaluated qualitatively. In this paper, the advantages and disadvantages of MCA mechanism and predictive recovery mechanism in error handling are analyzed, and register fault injection test is carried out on different server systems. It is concluded that it is necessary for hardware to be equipped with a suitable operating system to improve fault tolerance.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP302.8

【参考文献】

相关期刊论文 前1条

1 王建莹,杨孝宗,徐海智;用软件实现的故障注入工具评估错误检测机制[J];小型微型计算机系统;2000年05期



本文编号:1950997

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1950997.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户36e1d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com