Web用户行为数据收集统计系统的设计与实现
发布时间:2018-01-29 03:48
本文关键词: 网站流量分析 行为数据收集 JavaScript自动嵌入 Netty 出处:《北京交通大学》2015年硕士论文 论文类型:学位论文
【摘要】:互联网时代的到来,网络已经融入人们的生活,人们也逐渐接受了网上购物的消费模式。网购者的急剧增加,让各个电子商务网站投入更多的成本来吸引用户创造更多的营收。既然是电子商务网站,那么良好的网站设计,让用户满意的购物体验对网站的经营来说至关重要,所以网站分析就显得十分必要。要想了解用户访问网站的情况,就要获取全面而且详细的用户浏览网站的行为数据,从大数据的角度来讲海量信息使得网站分析更具洞察力,或许就会从不起眼的数据中挖掘到潜在的价值。 虽然现在已有很多第三方甚至免费的网站分析工具,但实际应用在网站中并不方便,如采用JavaScript页面标签法的Google Analytics,必须修改页面引入JavaScript代码,而且捕获某种用户行为数据需要大量地修改页面增加事件跟踪的代码,导致数据捕获的工作量繁重、管理不便,而且对数据的统计也不具有实时性;而服务器日志的方式不能进行事件跟踪,还要过滤数据。本文的重点就是实现一个用户行为数据收集统计系统,采用JavaScript页面标签法采集用户行为数据,但是不需手动修改页面,而是通过Nginx的模块功能自动将不同的JavaScript嵌入到各类页面中;事件跟踪的JavaScript代码可以统一管理,方便维护;数据收集服务器基于Netty,可以快速地处理大量的数据;行为数据通过数据收集服务器发送至MetaQ消息中间件,因为本系统对行为数据的统计有两种方式,分别是使用Hive实现定制化的周期报表和通过Storm实现实时统计并展示,所以这两种统计方式可以独立地从MetaQ消息中间件中拉取数据消息互不影响,因而将数据收集服务器从中解耦出来。 本人在项目中的工作主要包括用户行为数据采集方法的研究、行为数据采集和数据收集存储模块的实现,其中本人参与开发的是通过Hive生成各类运营统计报表,故Storm实时统计的实现不在本文中介绍。目前本系统已经为联通网上商城和手机商城等平台提供行为数据统计服务,借助已有的任务调度系统每日或周期性地生成报表发送给相关人员,而且就现有情况来看HDFS上的数据存储也基本达到了实时性,因此通过对行为数据的实时查询可以监控一些网站状况,如出现异常可通过短信接口发送告警信息给开发人员。
[Abstract]:With the advent of the Internet era, the Internet has been integrated into people's lives, and people have gradually accepted the consumption mode of online shopping. Let each e-commerce site invest more cost to attract users to create more revenue. Since it is an e-commerce site, so good website design. Customer satisfaction shopping experience is very important to the operation of the website, so website analysis is very necessary. To understand the user visit the site. Comprehensive and detailed user browsing behavior data is needed. From big data's point of view, vast amounts of information make website analysis more insightful, and may tap into potential value from unremarkable data. Although there are many third-party and even free website analysis tools, but the actual application in the site is not convenient. For Google Analytics using JavaScript page tags, the page must be modified to introduce JavaScript code. To capture certain user behavior data, it is necessary to modify the page to increase the code of event tracking, which leads to the heavy workload of data capture, the inconvenience of management, and the lack of real-time data statistics. However, the way of server log can not do event tracking, but also filter data. The focus of this paper is to implement a user behavior data collection and statistics system. JavaScript page tag method is used to collect user behavior data, but no manual modification of the page is required. Instead, it automatically embeds different JavaScript into all kinds of pages through the module function of Nginx. Event tracking JavaScript code can be unified management, easy to maintain; The data collection server is based on Netty. it can process a lot of data quickly. Behavior data is sent to the MetaQ messaging middleware through the data collection server, because there are two ways to calculate the behavior data in this system. Hive is used to realize customized periodic reports and real-time statistics and display through Storm. Therefore, these two statistical methods can independently pull data messages from MetaQ message middleware and decouple the data collection server from them. My work in the project mainly includes the research of user behavior data acquisition method, the implementation of behavior data acquisition and data collection and storage module. Among them, I participate in the development of Hive to generate all kinds of operational statistics reports. Therefore, the realization of Storm real-time statistics is not introduced in this paper. At present, this system has provided the behavior data statistics service for the platform such as Unicom online mall and mobile phone mall. With the help of the existing task scheduling system to generate or periodically generate reports to the relevant personnel, and the existing situation on the HDFS data storage is basically achieved real-time. Therefore, real-time query of behavior data can monitor the status of some websites, such as abnormal can send alarm information to developers through SMS interface.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP311.52;TP393.092
【参考文献】
相关期刊论文 前7条
1 李耸;房明;;基于Web的网站流量统计系统的设计[J];电脑知识与技术;2008年05期
2 张宏升;;软件架构的非功能性需求指标和区域化支持[J];电脑知识与技术;2011年09期
3 向坚持;刘相滨;徐选华;;基于用户行为的Web使用挖掘数据采集技术研究[J];计算机与现代化;2007年12期
4 袁雅萍;;网站流量评估监测系统的设计与实现[J];煤炭技术;2009年10期
5 赵仪,赵熊,张成昱;专业网站的评价指标分析[J];现代图书情报技术;2002年04期
6 马亚娜,钱焕延,孙亚民;Cookie在web认证中的应用研究[J];小型微型计算机系统;2004年02期
7 靳永超;吴怀谷;;基于Storm和Hadoop的大数据处理架构的研究[J];现代计算机(专业版);2015年04期
,本文编号:1472431
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1472431.html