基于Web日志挖掘的用户信息需求识别研究

发布时间：2019-05-08 04:40

【摘要】：当今时代,信息爆炸和信息迷向是所有信息用户所面临的现状之一。面对互联网我们渴望能通过搜索引擎从海量的信息中找到自己所真正需要的信息。由于用户自身的知识、背景以及所处的环境等各种因素,用户提交给搜索引擎的查询词往往不能准确的表达其信息需求。目前有学者单独研究用户基于搜索引擎的信息行为规律,期望能从用户的行为中发现用户的兴趣；也有学者考虑通过网络的形式进行问卷调查获取用户信息需求。本文所不同的是将用户的信息行为特征结合数据挖掘技术来建立识别用户信息需求的模型,以此来自动获取用户的信息需求,并期望将该模型用于提高搜索引擎的效率。本文侧重在通过用户的信息行为特征来挖掘用户的查询日志,建立用户信息需求的自动分类模型。本文首先对Web日志挖掘和用户信息需求两个方面的理论进行研究与分析,阐述了本文研究的理论基础,并提出要研究的问题。其次针对日志挖掘的数据预处理阶段做了详细的描述,介绍了本文数据的来源,数据的格式以及日志数据的清洗转换、用户识别等预处理操作过程。然后对用户的信息搜索行为进行分类,主要是针对用户的潜在搜索行为,利用简单的统计方法总结出搜索引擎用户一些基本的行为特征和规律。最后将基于搜索引擎的用户信息需求进行划分,分别为导航类信息需求和信息事务类信息需求,并利用用户的信息行为特征建立用户信息需求的自动分类模型。
[Abstract]:Nowadays, information explosion and information confusion are one of the current situations faced by all information users. In the face of the Internet, we are eager to find the information we really need from the vast amount of information through search engines. Due to various factors such as users' own knowledge, background and environment, the query words submitted by users to search engines are often unable to express their information requirements accurately. At present, some scholars study the rules of users' information behavior based on search engine alone, hoping to find the user's interest from the user's behavior, and some scholars consider obtaining users' information needs through questionnaire survey through the form of network. What is different in this paper is that the information behavior characteristics of users are combined with data mining technology to establish a model to identify the information requirements of users so as to automatically obtain the information requirements of users and expect this model to be used to improve the efficiency of search engines. This paper focuses on mining the user's query log through the characteristics of user's information behavior, and establishes the automatic classification model of user's information requirement. Firstly, this paper studies and analyzes the theory of Web log mining and user information requirement, expounds the theoretical basis of this research, and puts forward the problems to be studied. Secondly, the data pre-processing stage of log mining is described in detail, and the data source, data format, cleaning and transformation of log data, user identification and other pre-processing procedures are introduced in this paper. Then the information search behavior of users is classified, mainly aiming at the potential search behavior of users, using simple statistical method to summarize some basic behavior characteristics and rules of users in search engine. Finally, the user information requirements based on search engine are divided into navigation information requirements and information transaction information requirements, and an automatic classification model of user information requirements is established by using the information behavior characteristics of users.
【学位授予单位】：华中师范大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：G350

【引证文献】