基于STORM的数据流查询分析系统的设计与实现
发布时间:2018-05-06 16:41
本文选题:数据流分析 + 数据流查询 ; 参考:《哈尔滨工业大学》2017年硕士论文
【摘要】:数据流分析一直以来都是研究热点,特别是近十来年大数据的发展,使得数据流分析越来越重要和流行。就目前来看,简便易用的数据流分析系统还是比较少而且上手不易,需要相关的专业人员才行。本文以实验室项目的数据流分析系统为背景,阐述了一个基于storm的通用数据流分析系统。本文针对以上问题,通过分析数据流开发的需求提出并实现一种类似sql的数据流分析语言scql,并将生成的逻辑应用部署到storm上。该系统的特点是简单易用,只要懂sql并且简单学习scql语法即可使用,而且不需要去管理大量的配置信息。经过大量的测试表明,该系统是可行的且有效的。整个系统分为基础模块、适配器模块、编译模块三个模块,基础模块用来提供数据的处理类,适配器模块用来将逻辑应用部署到storm上,编译模块用来将scql语句经过语法分析、语义分析、算子的拆分合并和算子优化,最后编译构建一个逻辑应用。语法分析从抽象语法树中提取每个叶节点的信息然后进行语义分析,将信息重新组织创建表的元信息、语句分析结果和表达式的描述,下一步是根据语句进行算子拆分并创建执行器,然后生成物理执行计划。总结来说,本文介绍了数据流的背景和相关技术分析,在此基础上分析了系统的需求然后提出的总体架构和设计方案。在论文的核心部分详细阐述了系统的设计与实现,并且给出了系统的测试案例。
[Abstract]:Data flow analysis has always been a hot research topic, especially with the development of big data in recent ten years, which makes data flow analysis more and more important and popular. At present, simple and easy-to-use data flow analysis systems are relatively few and difficult to use, requiring relevant professionals. A general data flow analysis system based on storm is presented in this paper. Aiming at the above problems, this paper proposes and implements a data flow analysis language, scqlsimilar to sql, by analyzing the requirements of data stream development, and deploys the generated logic application to storm. The system is simple and easy to use, as long as you understand sql and simply learn scql syntax to use, and do not need to manage a lot of configuration information. A large number of tests show that the system is feasible and effective. The whole system is divided into three modules: the basic module, the adapter module and the compilation module. The basic module is used to provide the data processing class, the adapter module is used to deploy logic to the storm, and the compiler module is used to analyze the syntax of the scql statement. Semantic analysis, operator splitting and merging, operator optimization, and finally build a logical application. The syntax analysis extracts the information of each leaf node from the abstract syntax tree and then carries on the semantic analysis, reorganizes the information to create the table meta-information, the statement analysis result and the expression description. The next step is to split the operator according to the statement, create the executor, and then generate the physical execution plan. In conclusion, this paper introduces the background of the data flow and related technical analysis, and then analyzes the requirements of the system and then proposes the overall architecture and design scheme. In the core part of the paper, the design and implementation of the system are described in detail, and a test case is given.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前1条
1 周杰;毛宇光;;数据流查询语言的研究与实现[J];计算机技术与发展;2008年01期
相关硕士学位论文 前2条
1 杨鹏;面向流式数据处理平台JStorm的负载均衡技术研究[D];北京工业大学;2016年
2 徐超;大型互联网公司分布式消息系统的设计与实施[D];复旦大学;2013年
,本文编号:1853066
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1853066.html