基于Hadoop的用户浏览路径挖掘技术研究
发布时间:2018-03-01 11:48
本文关键词: Web日志 浏览偏爱路径 MapReduce Hadoop 出处:《湖南工业大学》2015年硕士论文 论文类型:学位论文
【摘要】:随着互联网发展带来的数据爆炸,使得Web服务器积累了大量的日志数据,如何从海里的Web日志中挖掘有价值的信息成为了目前的研究热点之一。通过对Web日志进行有效分析和挖掘,进而发现用户浏览偏爱路径,既可以为优化网站的拓扑结构提供参考,而且又能为企业制定更完善的营销策略提供依据。本文对基于Hadoop的用户浏览路径挖掘技术进行了相关研究,所做工作主要包括以下三个方面。1.提出并实现了一种基于可信兴趣度的用户浏览偏爱路径挖掘算法。在充分考虑用户浏览页面时对页面的感兴趣程度的前提下,提出了页面兴趣度的概念;结合用户浏览路径选择因素、页面放置位置和其他页面对该页面的链接原因及网站拓扑结构图修正加权衡量标准,提出了可信选择度的概念。并将可信选择度和页面兴趣度综合度量,得到可信兴趣度指标。提出并实现了基于可信兴趣度的用户浏览偏爱路径挖掘算法(MUPCDI)。2.提出并实现了基于MapReduce的可信兴趣度用户浏览偏爱路径挖掘算法,该算法运行于Hadoop分布式集群环境中,能对海量用户浏览偏爱路径进行分析与挖掘。3.针对目标数据集,应用基于可信兴趣度的用户浏览偏爱路径挖掘算法(MUPCDI)对可信兴趣度算法的阈值、准确性和有效性进行了对比分析;同时应用基于MapReduce的可信兴趣度用户浏览偏爱路径挖掘算法对分布式平台的高效性进行了对比分析。以上工作表明,本文提出的可信兴趣度算法对于挖掘用户浏览偏爱路径更为准确和有效;同时针对挖掘大数据集的Web日志,分布式环境下本文提出的基于MapReduce的可信兴趣度算法挖掘用户浏览偏爱路径的效率远高于单机环境下。
[Abstract]:With the data explosion brought by the development of the Internet, the Web server has accumulated a lot of log data. How to mine valuable information from Web logs in the sea has become one of the research hotspots at present. By analyzing and mining Web logs effectively, we can find the preferred path for users to browse. It can not only provide a reference for optimizing the topological structure of the website, but also provide the basis for the enterprise to formulate a more perfect marketing strategy. This paper has carried on the related research to the user browsing path mining technology based on Hadoop. The main work includes the following three aspects. 1. A user preference path mining algorithm based on trusted interest is proposed and implemented. This paper puts forward the concept of page interest degree, combines with the factors of user browsing path selection, the reason of page placement and the link of other pages to the page, and modifies the weighted measurement standard of website topology chart. In this paper, the concept of trusted selection is put forward, and the trust selection and page interest are comprehensively measured. A user browsing preference path mining algorithm based on trusted interest is proposed and implemented. 2. A user browsing preference path mining algorithm based on MapReduce is proposed and implemented. The algorithm runs in the Hadoop distributed cluster environment and can analyze and mine the preference paths of massive users. The user browsing preference path mining algorithm based on trusted interest is used to analyze the threshold, accuracy and validity of trust interest. At the same time, using the trusted interest degree user browsing preference path mining algorithm based on MapReduce, the efficiency of distributed platform is compared and analyzed. The trusted interest algorithm proposed in this paper is more accurate and effective for mining user browsing preference paths, while mining the Web logs of big data set. The MapReduce based trusted interest algorithm proposed in this paper is much more efficient than the single computer environment in mining user browsing preference paths.
【学位授予单位】:湖南工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP311.13
【参考文献】
相关期刊论文 前9条
1 崔晓靖;陈兴蜀;曾雪梅;;基于站点结构和浏览时间的路径补全算法[J];计算机工程与设计;2014年03期
2 柳平;李春青;姬婵娟;;基于HDFS的云存储架构模型分析[J];电脑知识与技术;2013年36期
3 王思宝;李银胜;;基于Web日志挖掘用户的浏览兴趣路径[J];计算机应用与软件;2012年01期
4 程苗;;基于云计算的用户浏览偏爱路径挖掘算法[J];计算机工程与应用;2011年29期
5 吴晶;张品;罗辛;盛浩;熊璋;;门户个性化兴趣获取与迁移模式发现[J];计算机研究与发展;2007年08期
6 郭岩,白硕,杨志峰,张凯;网络日志规模分析和用户兴趣挖掘[J];计算机学报;2005年09期
7 邢东山,沈钧毅,宋擒豹;从Web日志中挖掘用户浏览偏爱路径[J];计算机学报;2003年11期
8 邢东山,沈钧毅,宋擒豹;用户浏览偏爱模式挖掘算法的研究[J];西安交通大学学报;2002年04期
9 韩家炜,孟小峰,王静,李盛恩;Web挖掘研究[J];计算机研究与发展;2001年04期
,本文编号:1551799
本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/1551799.html