当前位置:主页 > 科技论文 > 计算机论文 >

复杂计算机系统可用性评测技术研究

发布时间:2018-02-26 11:24

  本文关键词: 容错计算 可用性 相关性分析 故障分布 Copula函数 可用性评测 出处:《哈尔滨工业大学》2013年博士论文 论文类型:学位论文


【摘要】:应用于金融业、电信业、能源、交通、航空等关系到国家经济安全和社会安全的关键行业中的复杂计算机系统不仅要求具有极强的事务处理能力,还要求具有极高的可用性,能够提供高速、稳定的信息处理服务。这类系统一旦发生延误和失效将会造成不可估量的经济损失,还有可能产生负面的社会影响。开展面向此类复杂计算机系统的可用性测试研究,将有助于提高其可用性,对于保障国民经济平稳运行具有重要意义。 以往的一些研究曾提出计算机系统的硬件部件之间以及软件故障之间存在着某种模式的相关性,并且相关性的存在会影响系统的可用性,但是大多是从理论角度出发,并未给出实际系统中相关性存在的直接证据,这使得围绕相关性的讨论往往缺乏实际系统的支撑而说服力不足。为此论文针对某银行计算机系统故障记录和高端服务器运行日志进行分析,指出了系统级部件和元件级部件之间可能存在相关性的证据。为了更好的建模系统可用性,论文对银行计算机系统故障记录和LANL故障数据集进行了故障分布对比分析,发现基于对称多处理机架构的计算系统发生硬件故障的故障间隔时间分布属于Weibull族。 为了达到高可用性要求,应用于关键行业中的复杂计算机系统往往采用k-out-of-n系统架构,论文重点讨论了考虑相关性因素的均分负载k-out-of-n系统的建模问题。首先利用随机过程理论为均分负载型的k-out-of-n系统建立系统模型,指出了该类系统自第i-1次部件失效起到发生第i次部件失效的时间的分布函数服从一个两参数威布尔分布,且系统在不同状态的停留时间存在相关性。本文引入copula理论,,提出了利用Gumbel Copula函数来捕捉系统在不同状态的停留时间之间右尾相关性的变化,给出了指定失效序列的k-out-of-n系统的部件相关系数矩阵计算算法。分析结果表明,使用考虑相关性的均分负载k-out-of-n系统模型,比不考虑相关性的模型更贴近系统实际运行情况。 为了直观的描述系统部件相关性问题,论文介绍了一种基于可靠性框图发展而来的系统描述模型DRBD(Dynamic Reliability Block Diagram)。论文介绍了DRBD的优点,并应用DRBD模型的思想描述了串联可靠性模型、共因/共模故障模型、冗余模型、RAID磁盘阵列模型等多种常见的系统构架方式,提出了基于DRBD模型评测系统可用性的方法,并针对上述多种系统部件连接方式分析了向广义随机Petri网(Generalized Stochastic Petri Net,GSPN)转化并求解可用性的方法。 传统的可用性测试方法,利用长时间在线运行多台、同配置的目标系统进行在线测试。但是应用于关键行业中的复杂计算机系统可用性较高,这导致在线跟踪测试将耗费相当长的时间才能得到准确的结果。论文针对这一问题提出了一种基于MTBF(mean time between failure)阈值的k-out-of-n系统可用性测试方法,将系统级可用性测试转化为面向冗余部件的可用性测试。本文面向事务处理型容错计算机系统设计并实现了一个可用性评测系统,由故障注入平台,可用性评测套件以及可用性评测系统数据库组成。利用HP Superdome服务器仿照银行业务系统搭建了一个模拟的双模应用系统环境,在线测试表明,评测结果与官方公布的结果处于同一数量级,本文提出的可用性测试系统可以在较短时间内测试目标系统是否达到要求的可用性级别。
[Abstract]:Used in finance, telecommunications, energy, transportation, aviation and other related to complex computer systems in key industries of national economic security and social security in the transaction requires not only have strong ability, also has high availability, can provide high-speed, stable information processing service. Once this kind of system delays and failure will cause incalculable economic losses, and may have a negative social impact. Research on usability test for such complex computer systems, will help to improve its usability, is of great significance to ensure the stable operation of the national economy.
Some previous studies have suggested a correlation exists between the model between the hardware components of a computer system and software faults, and the correlation will affect the availability of the system, but mostly from a theoretical perspective, direct evidence does not give the actual correlation existing in the system, which makes the discussion on the relationship of the lack of actual system support and convincing enough. This thesis focuses on a bank computer system fault records and high-end server log analysis, there may be a correlation between the evidence points to a system level components and component level components. In order to better usability modeling system, the fault data recording and LANL bank computer system fault fault sets are analyzed the distribution of contrast, found fault fault calculation system based on the hardware architecture of the symmetric multiprocessor The spaced time distribution belongs to the Weibull family.
In order to achieve high availability requirements of complex computer systems used in key industries often use k-out-of-n system architecture, this paper focused on the modeling problem considering the relationship between average load k-out-of-n system. Establish the system model of k-out-of-n system is the first to use the theory of stochastic process for load sharing type, points out the system from the I-1 component the failure distribution function of I occurred in the first time the component failure time obeys a two parameter Weibull distribution, and correlation between residence time in different states. This paper introduces the copula theory, put forward using Gumbel Copula function to capture changes in the system between the residence time of different states of the right tail correlation, calculation algorithm specified the components of the k-out-of-n system failure sequence correlation coefficient matrix is given. The analysis results show that the use of correlation are considered The distributed k-out-of-n system model is more close to the actual operation of the system than the model that does not consider the correlation.
In order to describe the correlation between system components directly, this paper introduces a kind of system and the reliability block diagram based on the development of the model DRBD (Dynamic Reliability Block Diagram). This paper introduces the advantages of DRBD, and the application of the idea of the DRBD model describes the reliability of series model, common mode / cause fault model, redundant model, system framework common RAID disk array model etc, and puts forward the method of usability evaluation system based on the DRBD model, and according to the various system components connection analysis to generalized stochastic Petri nets (Generalized Stochastic Petri Net, GSPN) transformation method and solving the availability.
The availability of the traditional test methods, the use of long time operation of multiple online, online test with target system. High availability of sophisticated computer systems but used in key industries, which leads to online tracking test will take a long time to get the results surely. Aiming at the problem put forward based on MTBF (mean time between failure) k-out-of-n system usability testing method of the threshold, the system level of usability testing into a usability test for redundant components. This type of transaction oriented fault-tolerant computer system design and implementation of a usability evaluation system, by fault injection platform, usability evaluation kit and usability evaluation system database. Modeled on the banking system to build a simulation mode application system environment using the HP Superdome server, online test It shows that the evaluation results are in the same order of magnitude with the official announcement results. The usability test system proposed in this paper can test whether the target system meets the required availability level in a relatively short time.

【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP306

【参考文献】

相关期刊论文 前3条

1 白保中;宋逢明;朱世武;;Copula函数度量我国商业银行资产组合信用风险的实证研究[J];金融研究;2009年04期

2 赵则章;江建慧;;操作系统健壮性测试方法研究[J];计算机工程与应用;2007年07期

3 彭俊杰;黄庆成;洪炳熔;李瑞;袁成军;;一种用于星载系统可靠性评测的软件故障注入工具[J];宇航学报;2005年06期

相关博士学位论文 前2条

1 李秀敏;极值统计模型族的参数估计及其应用研究[D];天津大学;2007年

2 吴娟;Copula理论与相关性分析[D];华中科技大学;2009年



本文编号:1537812

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1537812.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户dc081***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com