当前位置:主页 > 文艺论文 > 广告艺术论文 >

基于Hadoop的大型网站海量数据的统计与应用

发布时间:2018-11-19 09:26
【摘要】:随着互联网的飞速发展,人们对于网络的需求越来越广。然而,用户对互联网的需求却各不相同,往往体现出一定的偏好性。对于一个网站来讲,用户的一些行为在网站的后台留下了日志数据,并且这些数据是海量的。对这些海量数据进行处理与统计是分析用户行为特征、获取用户属性、检测广告投放效果最好的办法。 对于海量数据的处理,已经有了许多的研究,一些开源的软件框架不断地被开发出来。最流行的就是Hadoop分布式软件框架,它可以高效地处理海量数据。其中,Hadoop内置的Hive数据仓库框架同样可以高效地处理海量数据。目前,Hadoop得到了工程研发界的普遍关注。 某大型网站的用户行为分析项目就是为了分析用户行为特征而产生的。将通过这些海量数据的处理,进而挖掘用户的行为特征、用户属性和广告投放属性。在该项目中,利用Hadoop和Hive来处理这些海量数据。该项目主要分为以下几个部分:用户人群分类、总体数据统计、广告数据统计、cookie重合度统计、品牌探针和全网路统计。得到这些部分的统计结果后,进入数据分析阶段,挖掘相关信息,帮助制定运营策略。 本文详细介绍了这几个部分的设计与实现过程,并针对有的部分给出了简要的分析。首先介绍了项目背景和Hadoop的相关技术,然后详细介绍了项目目标和数据的一些细节。接着详细介绍了每个部分的作用以及如何使用Hadoop来帮助完成海量数据的处理。最后,对本文进行了简单的总结,指出不足之处和可以进行优化的地方。
[Abstract]:With the rapid development of the Internet, people's demand for the network is more and more extensive. However, the user's demand for the Internet is different, often reflecting a certain degree of preference. For a website, some user behavior leaves log data in the background of the site, and the data is massive. Processing and statistics of these massive data is the best way to analyze the characteristics of user behavior, to obtain user attributes, and to detect the effect of advertising. There has been a lot of research on massive data processing, and some open source software frameworks have been developed. The most popular is the Hadoop distributed software framework, which can deal with large amounts of data efficiently. Among them, Hadoop built-in Hive data warehouse framework can also efficiently deal with massive data. At present, Hadoop has been the general concern of the engineering research and development community. The user behavior analysis project of a large website is produced to analyze the characteristics of user behavior. Through the processing of these massive data, the user behavior characteristics, user attributes and advertising attributes will be mined. In this project, Hadoop and Hive are used to process these huge amounts of data. The project is mainly divided into the following parts: user population classification, general data statistics, advertising data statistics, cookie coincidence statistics, brand probe and the whole network statistics. After getting the statistical results of these parts, enter the stage of data analysis, mining relevant information, and help to formulate operational strategy. This paper introduces the design and implementation of these parts in detail, and gives a brief analysis for some parts. The background of the project and the related technology of Hadoop are introduced, and then some details of the project objectives and data are introduced in detail. Then the function of each part and how to use Hadoop to help complete the processing of massive data are introduced in detail. Finally, a brief summary of this paper, pointing out the shortcomings and can be optimized.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP393.092;TP311.5

【引证文献】

相关期刊论文 前1条

1 叶小榕;邵晴;;基于增强现实和位置服务的手机广告系统[J];科技导报;2013年04期

相关硕士学位论文 前1条

1 付倩文;基于Hadoop/hive架构的网络身份识别系统的设计与实现[D];南京邮电大学;2013年



本文编号:2341868

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/2341868.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8ae7c***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com