基于路径与页面挖掘的用户浏览行为研究
发布时间:2018-10-21 12:50
【摘要】:在用户与互联网产品进行交互,特别是web浏览的过程中,网络反馈了大量的行为数据。如何利用用户浏览过程中所产生的这些数据,进行深度挖掘和分析,摸透用户的行为、心理以及喜好,更好的改进产品提升用户体验,成为了当下许多互联网公司感兴趣的课题。 对互联网用户浏览行为进行研究,一个行之有效的办法,就是将用户浏览过程中反馈的web日志收集起来,通过web日志挖掘,从而实现用户浏览行为分析,这在许多学者的研究中己获得了成功。本文希望在前人的基础上,结合当前流行的Hadoop平台和数据仓库技术,将基于web日志挖掘的用户行为分析系统化、工程化,从而成为互联网企业日常生产中可以应用的项目,更好的实现对企业的产品开发、运营、管理的有效支撑。 本文基于路径与页面挖掘,对用户的页面浏览行为进行了研究,这主要包括四方面的内容: (1)对Hadoop数据处理平台及hive数据仓库进行了介绍,该平台通过分布式存储与计算,可以实现海量数据的高速且有效分析,并根据hive数据仓库的特点,提出了基于数据仓库的用户浏览行为研究框架。 (2)基于数据仓库构建了基础数据层及主题层,在主题层主要是用户浏览行为主题。 (3)通过研究关联规则算法与常用路径挖掘算法,提出了基于数据仓库的频繁访问路径挖掘Hive-CFAP算法。 (4)基于用户浏览行为主题及Hive-CFAP 算法,对用户频繁访问路径、页面浏览量与页面距离的关系,相似浏览用户的聚类进行了分析及应用。
[Abstract]:In the process of interaction between users and Internet products, especially web browsing, the network feedback a lot of behavior data. How to make use of the data generated in the process of browsing, to mine and analyze deeply, to understand the behavior, psychology and preferences of the user, and to improve the product to enhance the user experience. It has become a topic of interest to many Internet companies. To study the browsing behavior of Internet users, an effective method is to collect the web logs feedback during the browsing process, and to realize the user browsing behavior analysis through web log mining. This has been successfully studied by many scholars. This paper hopes to systematize and engineer the user behavior analysis based on web log mining based on the current popular Hadoop platform and data warehouse technology on the basis of predecessors, so as to become a project that can be applied in the daily production of Internet enterprises. Better implementation of the enterprise's product development, operation, management of effective support. Based on the path and page mining, this paper studies the user's page browsing behavior, which includes four aspects: (1) the Hadoop data processing platform and the hive data warehouse are introduced. Through distributed storage and computing, the platform can realize the high-speed and effective analysis of massive data, and according to the characteristics of hive data warehouse, The research framework of user browsing behavior based on data warehouse is proposed. (2) the basic data layer and topic layer are constructed based on data warehouse. In the topic layer, user browsing behavior is the main topic. (3) by studying association rules algorithm and common path mining algorithm, The Hive-CFAP algorithm of frequent access path mining based on data warehouse is proposed. (4) based on the topic of user browsing behavior and Hive-CFAP algorithm, the relationship among frequent access path, page views and page distance is discussed. The clustering of similar browsing users is analyzed and applied.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2285141
[Abstract]:In the process of interaction between users and Internet products, especially web browsing, the network feedback a lot of behavior data. How to make use of the data generated in the process of browsing, to mine and analyze deeply, to understand the behavior, psychology and preferences of the user, and to improve the product to enhance the user experience. It has become a topic of interest to many Internet companies. To study the browsing behavior of Internet users, an effective method is to collect the web logs feedback during the browsing process, and to realize the user browsing behavior analysis through web log mining. This has been successfully studied by many scholars. This paper hopes to systematize and engineer the user behavior analysis based on web log mining based on the current popular Hadoop platform and data warehouse technology on the basis of predecessors, so as to become a project that can be applied in the daily production of Internet enterprises. Better implementation of the enterprise's product development, operation, management of effective support. Based on the path and page mining, this paper studies the user's page browsing behavior, which includes four aspects: (1) the Hadoop data processing platform and the hive data warehouse are introduced. Through distributed storage and computing, the platform can realize the high-speed and effective analysis of massive data, and according to the characteristics of hive data warehouse, The research framework of user browsing behavior based on data warehouse is proposed. (2) the basic data layer and topic layer are constructed based on data warehouse. In the topic layer, user browsing behavior is the main topic. (3) by studying association rules algorithm and common path mining algorithm, The Hive-CFAP algorithm of frequent access path mining based on data warehouse is proposed. (4) based on the topic of user browsing behavior and Hive-CFAP algorithm, the relationship among frequent access path, page views and page distance is discussed. The clustering of similar browsing users is analyzed and applied.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前10条
1 单蓉;;一种基于用户浏览行为更新的兴趣模型[J];电子设计工程;2010年04期
2 肖国强,肖轶;一种从Web日志中挖掘访问模式的新算法[J];华中科技大学学报(自然科学版);2004年05期
3 何炎祥,孔维强,向剑文,朱骁峰;WebLog访问序列模式挖掘[J];计算机工程与应用;2003年27期
4 褚红丹;焦素云;马威;;用户访问兴趣路径挖掘方法[J];计算机工程与应用;2008年35期
5 田昌鹏;;基于Web日志分析的Web QoS研究[J];计算机科学;2007年06期
6 任永功;付玉;张亮;;一种改进的用户浏览偏爱路径挖掘方法[J];计算机工程;2009年08期
7 郭本俊;王鹏;陈高云;黄健;;基于MPI的云计算模型[J];计算机工程;2009年24期
8 程苗;陈华平;;基于Hadoop的Web日志挖掘[J];计算机工程;2011年11期
9 邢东山,沈钧毅,宋擒豹;从Web日志中挖掘用户浏览偏爱路径[J];计算机学报;2003年11期
10 卢喜利;周军;周月鹏;;基于Cookie技术和启发式规则的用户识别算法[J];微计算机应用;2009年11期
,本文编号:2285141
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2285141.html