基于Spark平台的网络数据分析系统的设计与实现
发布时间:2018-11-17 06:59
【摘要】:随着互联网技术的快速发展,内容分发网络(CDN)在互联网架构中起到重要作用,用户的上网记录也被记录在CDN服务提供商的网络日志中。各大CDN厂商都有一些通用的分析海量网络数据的需求,他们的PM管理人员,运营人员等非技术人员都需要对这些网络数据做一些通用的数据分析工作。针对CDN服务提供商,目前市场上缺少一个通用的网络数据分析服务平台。因此,为CDN厂商提供一个通用的,没有大数据平台使用门槛的网络数据分析服务平台有着迫切的需求。为了设计出一个通用的、操作简单、易扩展的分析海量网络数据的服务平台,本文利用现有的分布式框架设计并实现了基于Spark平台的网络数据分析服务平台。本文的主要工作有:(1)基于Spark大数据技术实现对海量网络数据的预处理以及处理分析。本文根据网络数据的特点,设计实现了网络数据分析服务工具;(2)对大数据平台Web化技术的研究。本文主要研究了如何在Web平台上浏览分布式存储引擎上的网络数据以及如何通过Web平台执行海量网络数据分析任务;(3)基于Yarn对整个大数据平台的管理机制,分析了资源管理器Yam和计算引擎Spark之间的关系,研究了如何通过监控Yarn来实现监控大数据平台中的Spark任务,从而保证整个系统平台的可用性;(4)研究了关于大数据分析结果的可视化。通过对第三方可视化插件的研究,提出引入Echarts将大数据分析结果呈现到页面中。根据对相关技术研究所取得的解决方案,本文实现了基于Spark平台的数据分析功能和大数据、平台的Web化,并通过实验验证了这些功能和平台的有效性。基于以上关键技术方案的实现,本文完成了网络数据分析服务平台的开发,为用户提供了相关的网络数据分析功能,网络数据预览功能,结果数据可视化,系统监控功能等功能,为掌握用户的上网行为特征提供一个平台,同时也为各大网站提供方和CDN厂商优化自身服务创造了条件。
[Abstract]:With the rapid development of Internet technology, the content distribution network (CDN) plays an important role in the Internet architecture, and users' online records are recorded in the CDN service provider's log. The major CDN manufacturers have some common requirements for analyzing massive network data, and their PM managers, operators and other non-technical personnel all need to do some general data analysis work on these network data. For CDN service providers, there is a lack of a common network data analysis service platform. Therefore, to provide CDN manufacturers with a general, no big data platform to use the threshold of network data analysis service platform has an urgent need. In order to design a general, simple and extensible service platform for analyzing massive network data, this paper designs and implements a network data analysis service platform based on Spark platform by using the existing distributed framework. The main work of this paper is as follows: (1) based on Spark big data technology, the preprocessing and processing of massive network data are realized. According to the characteristics of network data, this paper designs and implements a network data analysis service tool. (2) the research of big data platform Web technology. This paper mainly studies how to browse the network data on the distributed storage engine on the Web platform and how to carry out the massive network data analysis task through the Web platform. (3) based on the management mechanism of big data platform based on Yarn, this paper analyzes the relationship between resource manager Yam and computing engine Spark, and studies how to realize the task of monitoring the Spark in big data platform by monitoring Yarn. In order to ensure the usability of the whole system platform; (4) the visualization of big data analysis results is studied. Through the research of the third party visualization plug-in, this paper proposes to introduce Echarts to present big data analysis results to the page. According to the solutions obtained by the related technical research, this paper realizes the data analysis function based on the Spark platform and the Web of big data and the platform, and verifies the effectiveness of these functions and platforms through experiments. Based on the implementation of the above key technology, this paper has completed the development of network data analysis service platform, which provides users with related network data analysis function, network data preview function, result data visualization. The functions of system monitoring provide a platform for the users to master the characteristics of their Internet behavior, and also create conditions for the providers and CDN vendors to optimize their own services.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.09;TP311.13
[Abstract]:With the rapid development of Internet technology, the content distribution network (CDN) plays an important role in the Internet architecture, and users' online records are recorded in the CDN service provider's log. The major CDN manufacturers have some common requirements for analyzing massive network data, and their PM managers, operators and other non-technical personnel all need to do some general data analysis work on these network data. For CDN service providers, there is a lack of a common network data analysis service platform. Therefore, to provide CDN manufacturers with a general, no big data platform to use the threshold of network data analysis service platform has an urgent need. In order to design a general, simple and extensible service platform for analyzing massive network data, this paper designs and implements a network data analysis service platform based on Spark platform by using the existing distributed framework. The main work of this paper is as follows: (1) based on Spark big data technology, the preprocessing and processing of massive network data are realized. According to the characteristics of network data, this paper designs and implements a network data analysis service tool. (2) the research of big data platform Web technology. This paper mainly studies how to browse the network data on the distributed storage engine on the Web platform and how to carry out the massive network data analysis task through the Web platform. (3) based on the management mechanism of big data platform based on Yarn, this paper analyzes the relationship between resource manager Yam and computing engine Spark, and studies how to realize the task of monitoring the Spark in big data platform by monitoring Yarn. In order to ensure the usability of the whole system platform; (4) the visualization of big data analysis results is studied. Through the research of the third party visualization plug-in, this paper proposes to introduce Echarts to present big data analysis results to the page. According to the solutions obtained by the related technical research, this paper realizes the data analysis function based on the Spark platform and the Web of big data and the platform, and verifies the effectiveness of these functions and platforms through experiments. Based on the implementation of the above key technology, this paper has completed the development of network data analysis service platform, which provides users with related network data analysis function, network data preview function, result data visualization. The functions of system monitoring provide a platform for the users to master the characteristics of their Internet behavior, and also create conditions for the providers and CDN vendors to optimize their own services.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.09;TP311.13
【参考文献】
相关期刊论文 前9条
1 顾小苑;;Chubby和ZooKeeper系统的对比研究[J];数字技术与应用;2016年08期
2 李媛祯;杨群;赖尚琦;李博涵;;一种Hadoop Yarn的资源调度方法研究[J];电子学报;2016年05期
3 陈侨安;李峰;曹越;龙明盛;;基于运行数据分析的Spark任务参数优化[J];计算机工程与科学;2016年01期
4 薛志云;何军;张丹阳;曹维焯;;Hadoop和Spark在实验室中部署与性能评估[J];实验室研究与探索;2015年11期
5 ;运用Spark加速实时数据分析[J];电脑编程技巧与维护;2015年21期
6 陈虹君;;Spark框架的Graphx算法研究[J];电脑知识与技术;2015年01期
7 丁圣勇;闵世武;樊勇兵;;基于Spark平台的NetFlow流量分析系统[J];电信科学;2014年10期
8 申德荣;于戈;王习特;聂铁铮;寇月;;支持大数据管理的NoSQL系统研究综述[J];软件学报;2013年08期
9 张延松;焦敏;王占伟;王珊;周p,
本文编号:2336875
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2336875.html