教学资源搜索平台Web日志挖掘技术研究
发布时间:2018-05-19 02:10
本文选题:Web日志挖掘 + 数据预处理 ; 参考:《广西大学》2014年硕士论文
【摘要】:随着Web应用的不断增多,Web数据库的规模也在不断扩大,其数据量亦逐渐加大。Web日志挖掘利用数据挖掘技术对web服务器的log日志进行挖掘分析,探究日志中潜在的规则与模式,最终将其应用到网站架构设计、个性化服务等方面。Web日志挖掘的过程通常分为三个阶段:数据预处理阶段、模式发现阶段以及模式分析阶段。在整个Web日志挖掘过程中,最为重要的是数据预处理阶段,它能直接影响到后面模式识别与模式分析的算法性能及计算结果。其中会话识别是数据预处理的主要环节,同时也是最为基础、关键的步骤。本文的主要研究工作包括:(1)给出了一种基于动态时间阀值的Web会话识别方法。对目前常用的几种会话识别方法进行了详细的描述,分析了每种方法的优缺点,在参考基于时间的启发式识别方法基础上,提出一种以站点首页作为新会话的开始,以动态时间阀值来决定会话边界的改进会话识别方法,给出了算法流程图以及具体的实现方法。实验结果表明,改进的会话识别方法不仅可以识别出更多的真实用户会话,而且还能有效地提高会话识别的精确度和识全度。(2)设计了一个基于Web日志挖掘的教学资源搜索平台。该平台以广西中医药大学学校网站IIS日志为处理对象,选取了2013年7月某天的日志信息作为系统的分析数据。设计了系统的整体架构,对系统各主要模块的功能进行了详细的说明,给出了数据表结构和每个环节的流程图,编程实现了原型系统。
[Abstract]:With the increasing of Web application, the scale of web database is also expanding, and the data amount of web log mining is also gradually increasing. The data mining technology is used to mine and analyze the log of web server, and to explore the potential rules and patterns in the log. The process of Web log mining is usually divided into three stages: data preprocessing, pattern discovery and pattern analysis. In the whole process of Web log mining, the data preprocessing stage is the most important, which can directly affect the algorithm performance and calculation results of pattern recognition and pattern analysis. Session recognition is the main step of data preprocessing, and it is also the most basic and key step. The main research work in this paper includes: 1) A Web session recognition method based on dynamic time threshold is presented. This paper gives a detailed description of several commonly used methods of session recognition, analyzes the advantages and disadvantages of each method, and proposes a new session based on the first page of the site based on the reference of the heuristic recognition method based on time. An improved session recognition method based on the dynamic time threshold to determine the boundary of the session is presented. The algorithm flow chart and the implementation method are given. Experimental results show that the improved session recognition method can not only identify more real user sessions, but also effectively improve the accuracy and accuracy of session identification. (2) A teaching resource search platform based on Web log mining is designed. The platform takes the IIS log of Guangxi University of traditional Chinese Medicine as the processing object and selects the log information of July 2013 as the systematic analysis data. The whole structure of the system is designed, the functions of the main modules of the system are explained in detail, the structure of the data table and the flow chart of each link are given, and the prototype system is realized by programming.
【学位授予单位】:广西大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;TP393.09
【参考文献】
相关期刊论文 前1条
1 赵洁;董振宁;张沙清;肖南峰;;一种多粒度Web使用数据收集方法[J];现代图书情报技术;2011年02期
,本文编号:1908236
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1908236.html