服务器集群故障预警技术的研究与实现
发布时间:2019-01-01 17:02
【摘要】:随着互联网规模的发展,服务器集群不断平滑扩展,但众多的服务器组件数量导致故障的概率大大增加,从而对服务器集群的网络管理和可用性提出了严峻的挑战。对服务器集群进行实时状态监控,同时争取在故障发生前进行相应预警是用户迫切需要的,也是综合网络管理系统中一个极其重要的组成部分。 本文在分析了服务器集群预警现状并研究了现有故障监测模型和技术的基础上,结合IPMI规范,设计并实现了一个用于Linux服务器集群的故障预警系统。首先,设计了服务器设备状态监测模型,实现了对设备硬件、系统资源、系统服务以及应用服务等信息的监测;其次,设计并实现了采用SNMP和AgentX协议的管理端和代理端通信机制;最后,设计并实现了集群故障预警模型,对其中基于预警关联关系的预警过滤模型、故障预警判定模型、故障预警通知模型和设备资源管理模型分别进行了详细的设计和实现。本系统不仅对系统软件资源信息进行故障预警监测,而且将服务器硬件资源信息纳入预警监测体系。经测试表明,本文实现的服务器集群故障预警系统能够满足综合网络管理系统对于服务器集群的故障预警需求,,很好地实现了对服务器集群的实时监控和故障预警。
[Abstract]:With the development of the Internet, the server cluster is expanding smoothly, but the probability of failure is greatly increased due to the large number of server components, which poses a severe challenge to the network management and availability of the server cluster. It is urgent for users to monitor the real-time status of the server cluster and to make corresponding warning before the failure. It is also an extremely important part of the integrated network management system. On the basis of analyzing the current situation of server cluster warning and studying the existing fault monitoring model and technology, this paper designs and implements a fault early warning system for Linux server cluster based on IPMI specification. Firstly, the monitoring model of server equipment status is designed to monitor the equipment hardware, system resources, system services and application services. Secondly, the communication mechanism between management and agent is designed and implemented using SNMP and AgentX protocol. Finally, the cluster fault early warning model is designed and implemented, including the early warning filtering model based on early warning correlation, the fault early warning decision model, the fault warning notification model and the equipment resource management model, respectively. This system not only carries on the fault early warning monitoring to the system software resource information, but also brings the server hardware resource information into the early warning monitoring system. The test results show that the server cluster fault warning system realized in this paper can meet the needs of the integrated network management system for the server cluster fault early warning, and the real-time monitoring and fault early warning of the server cluster are well realized.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.07
本文编号:2397860
[Abstract]:With the development of the Internet, the server cluster is expanding smoothly, but the probability of failure is greatly increased due to the large number of server components, which poses a severe challenge to the network management and availability of the server cluster. It is urgent for users to monitor the real-time status of the server cluster and to make corresponding warning before the failure. It is also an extremely important part of the integrated network management system. On the basis of analyzing the current situation of server cluster warning and studying the existing fault monitoring model and technology, this paper designs and implements a fault early warning system for Linux server cluster based on IPMI specification. Firstly, the monitoring model of server equipment status is designed to monitor the equipment hardware, system resources, system services and application services. Secondly, the communication mechanism between management and agent is designed and implemented using SNMP and AgentX protocol. Finally, the cluster fault early warning model is designed and implemented, including the early warning filtering model based on early warning correlation, the fault early warning decision model, the fault warning notification model and the equipment resource management model, respectively. This system not only carries on the fault early warning monitoring to the system software resource information, but also brings the server hardware resource information into the early warning monitoring system. The test results show that the server cluster fault warning system realized in this paper can meet the needs of the integrated network management system for the server cluster fault early warning, and the real-time monitoring and fault early warning of the server cluster are well realized.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.07
【参考文献】
相关期刊论文 前10条
1 李源;;计算机网络故障管理智能化的研究[J];才智;2008年11期
2 黄晓明;;计算机网络故障管理技术的智能化研究[J];电脑知识与技术;2008年25期
3 冯瞻;吴新军;;服务器的远程管理的发展及其应用前景[J];电脑知识与技术;2011年30期
4 李波;刘军万;;UNIX服务器集中监控的设计与实现[J];电脑编程技巧与维护;2013年02期
5 杨洪涛;王继龙;;网络事件管理系统中关联技术的选择及实现[J];计算机工程;2006年04期
6 彭熙,李艳,肖德宝;网络故障管理中几种事件关联技术的分析与比较[J];计算机应用研究;2003年09期
7 易曼,郭成城,晏蒲柳;Linux下网络故障定位与诊断的实现技术[J];计算机应用研究;2003年11期
8 吴钟琴;潘荫荣;胡幼华;;小规模机群的远程自动监控系统[J];计算机应用与软件;2009年01期
9 娄山林;;浅谈IPMI标准[J];科技浪潮;2007年Z1期
10 黄明辉;;基于SNMP的网络故障管理系统的设计与实现[J];辽宁大学学报(自然科学版);2012年03期
本文编号:2397860
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2397860.html