分布式流计算框架容错功能的设计与实现
发布时间:2018-01-19 04:02
本文关键词: S4 检查点技术 容错计算 ZeroMQ 表决算法 出处:《北京邮电大学》2013年硕士论文 论文类型:学位论文
【摘要】:流计算是对实时数据流处理的研究,其理念是基于数据隐含价值随着时间的流逝而降低,所以数据出现后需要尽快处理。而在各种解决方案中,分布式流计算框架因其具有很好的扩展性,易用性和灵活性而受到业界的普遍欢迎,本文即着手设计和实现分布式流计算框架中最为关键的容错功能。 首先,本文立足于分布式流计算框架的研究背景和目前的研究走向,阐述容错功能对分布式流计算框架的重要意义,并提出了设计和实现容错功能的设想,介绍了实现分布式流计算框架容错功能所需要的关键技术。 接下来,本文以开源的分布式流计算框架S4(Simple Scalable Stream System)为依托,详细阐述分布式流计算框架的容错功能需求分析,主要包含三大部分:处理节点容错功能、处理节点间网络通信容错功能和处理节点计算容错功能。处理节点容错功能实现节点本身的可靠性,处理节点间网络通常容错功能实现网络通信的可靠性,处理节点计算容错功能实现应用程序处理结果的可靠性。 最后,本文对分布式流计算框架各容错功能进行详细设计和实现。处理节点容错功能采用的是温备份复制技术,定时的保存处理节点的状态信息到第三方的数据库MongoDB中,恢复处理节点时从MongoDB中获取节点最近保存的状态信息并初始化新创建的节点;处理节点间网络通信容错功能采用开源的网络通信库ZeroMQ实现,ZeroMQ对消息差错处理和网络连接故障处理均具有较好的容错能力;处理节点计算容错功能采用多版本的软件容错技术屏蔽系统运行中可能发生的各种故障。论文末尾对主要的容错功能模块进行了相应测试和分析,指出容错功能使分布式流计算系统具有较好的可靠性和健壮性,同时,简要概括论文有待改进的方面以及下一步的工作。
[Abstract]:Stream computing is a research on real-time data stream processing. The idea is that the implicit value of data decreases with the passage of time, so the data needs to be processed as soon as possible after the emergence of data. Distributed flow computing framework has been widely welcomed by the industry because of its good scalability, ease of use and flexibility. This paper begins to design and implement the most critical fault-tolerant function in the distributed flow computing framework. First of all, based on the research background and current research trend of distributed flow computing framework, this paper expounds the importance of fault-tolerant function to distributed flow computing framework, and proposes the design and implementation of fault-tolerant function. This paper introduces the key technologies needed to implement the fault-tolerant function of distributed stream computing framework. Next, this paper relies on the open source distributed stream computing framework S4N simple Scalable Stream system. The requirement analysis of fault-tolerant function of distributed flow computing framework consists of three parts: processing node fault-tolerant function. The fault-tolerant function of the processing node realizes the reliability of the node itself, and the fault-tolerant function of the processing node usually realizes the reliability of the network communication. The processing node computing fault tolerant function realizes the reliability of the processing result of the application program. Finally, this paper designs and implements the fault-tolerant functions of distributed stream computing framework in detail. The processing node fault-tolerant function is based on warm backup replication technology. Periodically saves the status information of the processing node into the third-party database MongoDB. The newly created node is initialized by retrieving the state information recently saved by the node from the MongoDB while recovering the processing node. The fault-tolerant function of network communication between processing nodes adopts the open source network communication library ZeroMQ to implement ZeroMQ which has good fault-tolerant capability for both message error handling and network connection fault processing. Processing node computing fault-tolerant function adopts multi-version software fault-tolerant technology to shield all kinds of faults that may occur in the operation of the system. At the end of the paper the main fault-tolerant function modules are tested and analyzed accordingly. It is pointed out that the fault-tolerant function makes the distributed stream computing system more reliable and robust. At the same time, the paper briefly summarizes the aspects that need to be improved and the next work to be done.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP302.8
【参考文献】
相关期刊论文 前1条
1 高明;金澈清;王晓玲;田秀霞;周傲英;;数据世系管理技术研究综述[J];计算机学报;2010年03期
相关博士学位论文 前1条
1 李磊;分布式系统中容错机制性能优化技术研究[D];国防科学技术大学;2007年
,本文编号:1442515
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1442515.html