基于Spark和Hive的网易移动大数据支持平台的设计与实现

发布时间：2018-05-30 02:19

本文选题：精准决策支持 + 大数据支持平台　；参考：《北京交通大学》2017年硕士论文

【摘要】：对于互联网产品来讲,精准的决策和运营决定着产品的兴盛与衰亡。目前,互联网产品的决策和运营主要依靠于产品团队和运营团队人员的经验,但是真正可以做到精准决策的人才有如凤毛麟角。传统的运营手段,不仅烧钱而且存在着用户粘性低等问题。网易作为互联网的先行者,各个业务线都拥有丰富的数据,但是各业务线的数据分析工作零散,适用范围狭窄,缺乏通用性。因此公司迫切需要一个平台整合内部的数据资源,为相关部门统一提供精准的数据分析支持服务。在这一背景下,移动大数据支持平台应运而生,为克服传统决策和运营手段的局限性、提高公司资源复用提供了良好的解决方案。本平台基于分布式开发组件开发,所运用到的组件有Kafka、HDFS、Spark、Hive、HDFS Connector 等,同时结合 Spring、MyBatis 等 Java Web 框架技术进行系统后台服务的支持。平台由基础架构子系统、数据分析子系统和数据挖掘子系统组成。实习过程中,本人主要负责数据分析子系统的设计与开发工作,主要包括用户行为分析、交易行为分析、页面热力图、路径分析和辅助服务5个模块。其中,前4个模块的功能实现由数据分析部分和后台服务部分协作完成,辅助服务模块的功能实现由后台服务部分独立完成。数据分析部分使用大数据分析技术,后台服务部分使用Java Web技术。本论文中首先介绍平台开发过程中所用到的关键性技术,其次从整体上对平台的功能进行介绍,着重分析了本人负责模块的功能需求,并对系统的非功能性需求进行了阐述。然后介绍了系统的技术架构设计、功能架构设计和数据库设计。接着重点介绍了平台的详细设计与实现过程,并展示了平台各模块的实现效果图。最后对平台使用的测试与验证方法进行了说明。移动大数据支持平台为网易旗下的众多产品提供了简洁方便的产品接入点和精准的数据分析支持服务,提高了公司内部数据资源的复用性,提升了产品、运营等相关部门的工作效率。同时,为数据挖掘团队提供数据源,帮助他们进一步挖掘数据价值,提高部门和公司的收益。
[Abstract]:For Internet products, accurate decision-making and operation determine the prosperity and decline of products. At present, the decision and operation of Internet products mainly depend on the experience of product team and operation team, but few people can make accurate decision. Traditional operation means, not only burn money but also the problem of low user viscosity. As the forerunner of the Internet, NetEase has abundant data for each service line, but the data analysis work of each service line is scattered, the scope of application is narrow, and lack of generality. Therefore, the company urgently needs a platform to integrate internal data resources and provide accurate data analysis support services for relevant departments. Under this background, the mobile big data support platform emerges as the times require, which provides a good solution for overcoming the limitation of traditional decision-making and operation means and improving the reuse of company resources. This platform is based on distributed development component development. The components used in this platform are Kafka HDFSX / SparkConnector and so on. At the same time, the system background service is supported by combining with Spring's MyBatis and other Java Web framework technologies. The platform consists of infrastructure subsystem, data analysis subsystem and data mining subsystem. In the process of practice, I am mainly responsible for the design and development of the data analysis subsystem, including user behavior analysis, transaction behavior analysis, thermal diagram of the page, path analysis and auxiliary service five modules. Among them, the first four modules are implemented by the data analysis part and the background service part, and the auxiliary service module is implemented independently by the background service part. Data analysis part uses big data analysis technology, background service part uses Java Web technology. This paper first introduces the key technologies used in the development of the platform, then introduces the functions of the platform as a whole, analyzes the functional requirements of the responsible module, and expounds the non-functional requirements of the system. Then introduces the system's technical architecture design, functional architecture design and database design. Then the detailed design and implementation process of the platform are introduced, and the implementation effect of each module is shown. Finally, the test and verification methods used in the platform are described. Mobile big data support platform provides simple and convenient product access points and accurate data analysis support services for many products under NetEase, improves the reuse of data resources within the company, and improves the products. Work efficiency of related departments such as operation. At the same time, provide data sources for data mining teams, help them to further mining data value, and improve the revenue of departments and companies.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.52;TP311.13

【参考文献】