基于分布式框架的网络事件实时感知系统
发布时间:2018-01-11 03:10
本文关键词:基于分布式框架的网络事件实时感知系统 出处:《浙江大学》2017年硕士论文 论文类型:学位论文
【摘要】:随着互联网的发展,面对海量数据时,个人的精力无法满足完成提取、获得全面而精确的信息的任务的要求,从而掌握一个特定领域下的趋势发展。基于此便提出了以事件形式作为载体,通过从不断处理的新的文档中提取事件信息,之后合并到旧有信息中,呈现给用户宏观上的统计数据和具体分析内容,并辅助人们进行各类决策。现阶段较为成熟的事件感知系统依赖于大规模计算集群,以流式与批量式集合的方式,完成了大数据规模下应用的实现。本文聚焦于在小规模集群下能够实时获取事件结果,进行查询的总体要求,以流式处理的形式,在增加系统整体处理效率与减少对算法影响的目标下,完成事件感知各项应用功能。本文基于上述目标,设计并开发了一套分布式处理平台,满足应用在各个环节下的应用要求。主要的工作包括:1)针对事件感知应用的输入、输出,用户对象进行分析,将系统划分为三个模块,完成系统总体架构设计。2)在存储模块下设计了存储形式,包括MongoDB内数据的表达与NAF标引格式。3)在处理模块下,对事件感知传统的两种类型任务在流式数据环境下进行了分布式扩展,提出了各自的拓扑设计。同时针对系统运行的Storm计算框架,优化了拓扑调度器,并针对内存计算设计了符合事件感知容错性要求的内存数据的持久化策略。4)分析与服务模块设计了针对不同查询类型的响应策略,并在查询后台设计了在分布式内存环境下基于封闭立方体的维度统计方法最后以实际检验检疫应用出发为导向,验证了系统的可用性与性能。
[Abstract]:With the development of the Internet, in the face of massive data, the individual energy can not meet the task of extracting, obtaining comprehensive and accurate information. In order to grasp the trend of development in a specific field. Based on this, it is proposed to take the form of events as the carrier, through the continuous processing of new documents from the extraction of event information, and then merged into the old information. It presents users with macroscopic statistical data and concrete analysis content, and assists people to make all kinds of decisions. At this stage, the more mature event perception system relies on large-scale computing clusters. The implementation of big data application under the scale of big data is completed by the way of flow and batch collection. This paper focuses on the overall requirements of real-time event results and query in small scale cluster, in the form of flow processing. Under the goal of increasing the overall processing efficiency of the system and reducing the impact on the algorithm, this paper designs and develops a distributed processing platform based on the above objectives. The main work includes: 1) analyzing the input, output and user object of the event-aware application, and dividing the system into three modules. Complete the system architecture design. 2) Design the storage form under the storage module, including the data expression in MongoDB and the NAF indexing format. 3) under the processing module. Two kinds of traditional event-aware tasks are extended in the streaming data environment, and their topology design is proposed. At the same time, the topology scheduler is optimized for the Storm computing framework. The persistence strategy of memory data, which meets the requirements of event-aware fault-tolerance, is designed for memory computing. 4) Analysis and service modules are designed to respond to different query types. The dimension statistics method based on closed cube in distributed memory environment is designed in the query background. Finally, the application of practical inspection and quarantine is taken as the guide to verify the availability and performance of the system.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前4条
1 张亮;白振兴;周军;白云;;一种生成封闭数据立方体的新算法[J];弹箭与制导学报;2010年03期
2 吴飞;庄越挺;;互联网跨媒体分析与检索:理论与算法[J];计算机辅助设计与图形学学报;2010年01期
3 游进国;奚建清;张平健;刘艳霞;;在PC集群上的封闭立方体计算[J];计算机科学;2009年06期
4 李盛恩,王珊;封闭数据立方体技术研究[J];软件学报;2004年08期
,本文编号:1407872
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1407872.html