基于分布式存储的数据采集和分析平台的设计与实现
发布时间:2019-06-15 00:59
【摘要】:随着WEB 2.0,移动互联网和物联网的蓬勃发展等,使现在的数据呈现爆炸性的增长。面向海量数据的大数据平台研究和开发正在成为业界的热点。数据的采集、存储和分析是大数据平台的核心问题,其中数据采集主要用以解决多样数据源的接入问题;数据存储主要用于解决数据格式多样性和数据海量存储等问题;而数据分析以提供多样性的算法为目标,为此必须提供易插拔的分析算法引擎。本文基于对主流大数据开源框架的调研、分析和应用,提出了一种基于分布式存储的数据采集和分析平台。该解决方案以统一接口实现数据采集,整合多种存储方式提供数据存储,以插件化的算法引擎提供多样化的数据分析。本文从调研主流开源框架和技术平台入手,分析了分布式系统的技术特点,提出了面向大数据的数据采集和分析平台的功能性以及非功能性需求;接着介绍了本平台的设计迭代过程,对平台架构演进的各个阶段都进行了细致的说明;而后对本平台的实现和测试进行了详细的阐述,给出了功能性和非功能性两方面的测试结果;最后通过介绍了两个具体的应用实例,进一步验证了本平台的有效性。
[Abstract]:With the rapid development of WEB 2.0, mobile Internet and Internet of things, the current data show explosive growth. The research and development of big data platform for massive data is becoming a hot spot in the industry. Data collection, storage and analysis are the core problems of big data platform, in which data acquisition is mainly used to solve the access problem of various data sources; data storage is mainly used to solve the problems of data format diversity and data mass storage; and data analysis aims at providing diverse algorithms, so it is necessary to provide an easy-to-plug analysis algorithm engine. Based on the investigation, analysis and application of the mainstream big data open source framework, this paper proposes a data acquisition and analysis platform based on distributed storage. The solution realizes data acquisition with unified interface, integrates a variety of storage methods to provide data storage, and provides diversified data analysis by plug-in algorithm engine. Starting with the investigation of the mainstream open source framework and technology platform, this paper analyzes the technical characteristics of the distributed system, puts forward the functional and non-functional requirements of the data acquisition and analysis platform for big data, then introduces the design iteration process of the platform, and explains in detail the various stages of the evolution of the platform architecture. Then the implementation and test of the platform are described in detail, and the test results of functional and non-functional are given. Finally, two specific application examples are introduced to further verify the effectiveness of the platform.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP274.2;TP311.52
本文编号:2499844
[Abstract]:With the rapid development of WEB 2.0, mobile Internet and Internet of things, the current data show explosive growth. The research and development of big data platform for massive data is becoming a hot spot in the industry. Data collection, storage and analysis are the core problems of big data platform, in which data acquisition is mainly used to solve the access problem of various data sources; data storage is mainly used to solve the problems of data format diversity and data mass storage; and data analysis aims at providing diverse algorithms, so it is necessary to provide an easy-to-plug analysis algorithm engine. Based on the investigation, analysis and application of the mainstream big data open source framework, this paper proposes a data acquisition and analysis platform based on distributed storage. The solution realizes data acquisition with unified interface, integrates a variety of storage methods to provide data storage, and provides diversified data analysis by plug-in algorithm engine. Starting with the investigation of the mainstream open source framework and technology platform, this paper analyzes the technical characteristics of the distributed system, puts forward the functional and non-functional requirements of the data acquisition and analysis platform for big data, then introduces the design iteration process of the platform, and explains in detail the various stages of the evolution of the platform architecture. Then the implementation and test of the platform are described in detail, and the test results of functional and non-functional are given. Finally, two specific application examples are introduced to further verify the effectiveness of the platform.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP274.2;TP311.52
【参考文献】
相关期刊论文 前2条
1 徐鹏;陈思;苏森;;互联网应用PaaS平台体系结构[J];北京邮电大学学报;2012年01期
2 徐晶,许炜;消息中间件综述[J];计算机工程;2005年16期
,本文编号:2499844
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2499844.html