面向超级计算机的自适应故障预测算法研究
本文选题:系统容错 + 超级计算机 ; 参考:《重庆大学》2014年硕士论文
【摘要】:随着信息技术的发展,云计算等大型分布式系统开始广泛投入部署和应用。然而随着应用系统软硬件复杂性的增加,如何保证系统能够长时间正确运行,为广大用户提供高质量服务,成为了大型系统设计开发过程中需要考虑的问题。大型系统如果能够通过故障预测策略实现自我诊断,那么其容错能力和资源调度能力就能得到很大的提升,从而保证系统的高可用性和高可靠性。超级计算机拥有复杂的计算机系统,针对超级计算机的故障预测研究对于提高超级计算机的运算性能和系统容错能力具有重要意义,并且有效的故障预测策略也可以应用于其它大型系统中,以此提高这些系统的容错能力。 本文以超级计算机的系统运行日志为基础,首先设计并实现了基于语义和时间相关的过滤算法(Semantic Time Filter Algorithm,简记STF),对日志记录进行预处理。STF算法考虑日志记录之间的语义相关度和时间相关度,根据两个相关度对原始日志记录中的冗余记录进行过滤。通过实验发现,过滤后的日志记录序列能够有效地反映系统中非故障事件到故障事件的演变过程,对于后续分析并建立故障预测模型有很大帮助。 通过对过滤后的日志记录进行分析,本文运用数据挖掘中的分类预测思想,将时间轴划分为一定大小的时间窗,针对时间窗进行特征提取,以时间窗为单位进行故障预测。本文使用AdaBoost算法在SVM分类器的训练学习过程中,根据训练集动态调整分类器核心参数,使分类器进行自适应学习提升,建立了自适应故障预测模型AdaBoostSVM。 本文以超级计算机BlueGene/L215天的系统运行日志为实验数据集,经过预处理后,在该数据集上进行预测模型的对比实验。实验结果表明:本文的AdaBoostSVM模型较基于故障记录之间时间间隔(Time Between Failure TBF)、基于kNN、RIPPER以及SVM的故障预测模型具有更好的分类预测性能,特别是在故障预测中的重要指标召回率方面,自适应故障预测模型AdaBoostSVM的召回率要高出其它预测模型10%-20%。
[Abstract]:With the development of information technology, cloud computing and other large-scale distributed systems have been widely deployed and applied. However, with the increasing complexity of the software and hardware of the application system, how to ensure that the system can run correctly for a long time and provide high quality service for the majority of users has become a problem to be considered in the process of large-scale system design and development. If a large system can diagnose itself by fault prediction strategy, its fault-tolerant ability and resource scheduling ability can be greatly improved, thus ensuring the high availability and high reliability of the system. Supercomputers have complex computer systems. The study of fault prediction for supercomputers is of great significance to improve the performance of supercomputers and the fault tolerance of systems. Effective fault prediction strategies can also be applied to other large systems to improve their fault tolerance. This paper is based on the system running log of supercomputer, Firstly, a filtering algorithm based on semantic and temporal correlation is designed and implemented, which is abbreviated to STF. The preprocessing. STF algorithm considers the semantic correlation and time correlation between log records. The redundant records in the original log records are filtered according to the two correlations. It is found through experiments that the filtered logging sequence can effectively reflect the evolution process from non-fault events to fault events in the system, which is of great help to the subsequent analysis and the establishment of fault prediction models. Based on the analysis of filtered log records, this paper uses the idea of classification and prediction in data mining, divides the time axis into time windows of a certain size, extracts features from time windows, and makes fault prediction based on time windows. In this paper, the AdaBoost algorithm is used in the training process of SVM classifier. According to the dynamic adjustment of the kernel parameters of the classifier, the classifier is promoted by adaptive learning, and an adaptive fault prediction model, AdaBoostSVM, is established. In this paper, the system running log of the supercomputer BlueGeneR / L 215 days is taken as the experimental data set. After preprocessing, the prediction model is compared on the data set. The experimental results show that the proposed AdaBoost SVM model has better classification performance than that based on time interval between fault records and between time between fault records, kNNNNNRIPPER and SVM, especially on the recall rate of important indexes in fault prediction. The recall rate of adaptive fault prediction model AdaBoostSVM is higher than that of other prediction models.
【学位授予单位】:重庆大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP338
【参考文献】
相关期刊论文 前10条
1 丁世飞;齐丙娟;谭红艳;;支持向量机理论与算法研究综述[J];电子科技大学学报;2011年01期
2 严超;王元庆;李久雪;张兆扬;;AdaBoost分类问题的理论推导[J];东南大学学报(自然科学版);2011年04期
3 余雯;蒋盛益;黄兴全;;基于聚类和Ripper的稀有类分类方法[J];暨南大学学报(自然科学与医学版);2009年01期
4 田曲波;邱德红;张奇峰;孙蕾;;超级计算机错误预测模型研究[J];计算机工程与应用;2010年20期
5 宋枫溪,高林;文本分类器性能评估指标[J];计算机工程;2004年13期
6 蒋句平,庞征斌,周兴铭;高性能计算机RAS技术现状与趋势[J];计算机工程与科学;2005年01期
7 张晓龙;任芳;;支持向量机与AdaBoost的结合算法研究[J];计算机应用研究;2009年01期
8 刘海涛;黄敏;朱启兵;王聪;;基于支持向量机的不平衡数据分类算法的研究[J];计算机应用研究;2009年08期
9 王晓丹;孙东延;郑春颖;张宏达;赵学军;;一种基于AdaBoost的SVM分类器[J];空军工程大学学报(自然科学版);2006年06期
10 刘晓华;;基于WEKA的数据挖掘技术在物流系统中的应用[J];科技情报开发与经济;2007年22期
相关博士学位论文 前2条
1 伊卫国;基于关联规则与决策树的预测方法研究及其应用[D];大连海事大学;2012年
2 杨杰明;文本分类中文本表示模型和特征选择算法研究[D];吉林大学;2013年
,本文编号:2049966
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2049966.html