当前位置:主页 > 科技论文 > 计算机论文 >

数据中心集群监控系统设计与实现

发布时间:2018-06-24 22:58

  本文选题:集群 + 监控 ; 参考:《中国地质大学(北京)》2012年硕士论文


【摘要】:当前以太网带宽不断提升,普通计算机价格不断下降,由普通PC计算机作为节点,构成基本运算单元,通过高速局域网相互连接,依靠软件进行协作进行工作的集群系统具有性价比高、扩展性好等优势,已经取代了传统的大型机或巨型机,在很多工业领域得到了广泛的应用,如信息检索、文本分析、大规模的数据挖掘、机器学习和时下流行的云计算。随着集群系统的使用日益广泛,人们为了提高集群系统的计算性能,不断增加集群系统中节点的数量。集群系统由普通PC机器组成,PC机器性能并不稳定,单个节点失效可能性非常大,在集群的规模不断扩大后,集群系统监控的作用越来越重要。通过监控,可以发现哪些节点已经失效,停止工作,得到系统中每个节点的利用情况,分析整个集群的运行趋势、性能极限和作业瓶颈,为系统管理员的管理工作和集群任务调度提供依据。 本课题来自于子午工程数据中心,意在监控数据中心负责空间天气数值计算的集群系统的运行情况。本文根据子午工程数据中心的具体要求,设计和实现了一个集群监控系统,它的功能包括:采集集群系统中各个节点和系统负载、处理器各项使用时间、内存使用情况、硬盘使用情况、网络流量、系统相关的各种度量项;将各个节点的度量项汇总,存入数据库,以WEB网页的形式,展现给终端用户,供用户查询和使用这些监控项;根据用户设置的度量项的取值范围,对这些度量项进行量化分析,一旦发现存在异常度量项,则通过预定的通信规则,将异常的监控项发送给相关人员,以进行进一步的处理,减少不必要的损失。系统为C/S结构,,包括分布在各个节点的代理程序,一定数量的汇总程序和前台显示界面。系统从/proc获取监控数据,使用XML进行数据传送,RRDTool来绘制数值类监控项的趋势图,后台包括RRD和MySQL两种类型的数据库。 本文设计的集群监控系统,能够稳定有效的监控子午工程数据中心,具有占用系统资源少、反应灵敏等特点。
[Abstract]:At present, the bandwidth of Ethernet is increasing and the price of ordinary computer is decreasing. The common PC computer is used as the node to form the basic operation unit, which is connected to each other through high-speed local area network. The cluster system, which relies on software to work together, has the advantages of high cost performance and good expansibility. It has replaced the traditional mainframe or supercomputer, and has been widely used in many industrial fields, such as information retrieval, text analysis, etc. Large-scale data mining, machine learning, and the current popularity of cloud computing. With the increasing use of cluster system, in order to improve the computing performance of cluster system, the number of nodes in cluster system is increasing. The cluster system is composed of ordinary PC machines and the performance of PC machine is not stable, and the possibility of single node failure is very large. After the expansion of cluster scale, the monitoring function of cluster system becomes more and more important. Through monitoring, we can find out which nodes have failed, stop working, get the utilization of each node in the system, analyze the running trend, performance limit and job bottleneck of the whole cluster. It provides the basis for the management of the system administrator and the task scheduling of the cluster. The purpose of this paper is to monitor the operation of the cluster system which is responsible for the spatial weather numerical calculation in the Meridian Engineering data Center. According to the specific requirements of Meridian Engineering data Center, a cluster monitoring system is designed and implemented in this paper. Its functions include: collecting each node and system load in the cluster system, processing time, memory usage, etc. Hard disk usage, network traffic, system related measures, the measurement items of each node are summarized, stored in the database, displayed to the end users in the form of Web pages, for users to query and use these monitoring items; According to the value range of the measurement items set by the user, the quantitative analysis of these measures is carried out. Once the abnormal metrics are found, the monitoring items of the exceptions are sent to the relevant personnel through the predetermined communication rules for further processing. Reduce unnecessary losses. The system consists of C / S structure, including agents distributed in each node, a certain number of summary programs and foreground display interface. The system obtains monitoring data from / proc, uses XML to transfer data to RRDTool to draw the trend diagram of numerical class monitoring items, and backstage includes two types of databases: RRD and MySQL. The cluster monitoring system designed in this paper can monitor meridian engineering data center stably and effectively.
【学位授予单位】:中国地质大学(北京)
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP308;TP277

【参考文献】

相关期刊论文 前9条

1 刘波,代亚非,吴非;Apache服务器监控系统的研究[J];高技术通讯;2001年02期

2 邢航,刘清,郑桦,徐智穹;基于网络的远程监控系统研究[J];广东自动化与信息工程;2004年01期

3 秦中盛;王寅峰;董小社;;支持网格监控服务自动部署的系统[J];华中科技大学学报(自然科学版);2006年S1期

4 魏文国,张凌,董守斌,梁正友;一个可靠的集群簇/网格监控系统[J];计算机应用;2004年05期

5 门健;网络告警管理系统的设计与测试[J];空军工程大学学报(自然科学版);2004年04期

6 徐建;张琨;刘凤玉;;基于Linux的计算系统性能监控[J];南京理工大学学报(自然科学版);2007年05期

7 范军涛;李国庆;;实用的机群监控系统[J];计算机工程与设计;2008年01期

8 孙爱婷;刘青昆;;高效的机群监控信息采集模型[J];计算机工程与设计;2010年20期

9 刘青昆;孙爱婷;;具有容错机制的机群监控系统[J];计算机工程与设计;2010年21期



本文编号:2063355

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2063355.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f0142***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com