基于Session过程的搜索优化
发布时间:2018-07-05 06:14
本文选题:信息检索 + Session马尔可夫随机场 ; 参考:《北京邮电大学》2013年硕士论文
【摘要】:随着互联网信息的爆炸式增长,搜索引擎在网络信息查找中起到至关重要的作用。而对海量数据,传统搜索算法存在应用局限性。首先,面向关键词的搜索方式,对用户构建查询的能力要求较高。其次,利用用户的简短查询与海量信息进行相关性匹配,准确率和召回率较低。最后,通用的搜索算法无法提供个性化检索服务。为解决上述问题,本文以Session信息为对象,研究基于Session过程的搜索优化。 Session过程是指用户为满足其预先设定的搜索需求,在搜索过程中,进行的一系列查询词的修改以及与搜索结果的交互行为,包括对搜索结果页而的点击行为、浏览时间等信息。本文以Session信息为依托,提出了基于马尔可夫随机场的Session检索模型,以实现搜索优化的目的。本文的主要研究包括以下几方面。 第一,以马尔可夫随机场为理论基础,构建而向Session过程的检索模型。通过对用户搜索行为模式的分析,从Session过程的时序特性出发,构建动态演进的Session检索模型。 第二,以语言学特性分析为基础,研究词关联性假设在Session检索过程的优化作用。本文从词完全独立模式FIP及词序列关联模式SDP出发,构建了FISM和SDSM两类Session检索模型,进而探讨词关联性假设在Session检索过程中产生的影响。 第三,以Session信息的类别划分为基础,研究Session各类信息在检索中的影响力。本文将Session信息划分为两类:历史查询HQ和历史点击网贞HC。通过Session检索模型的定义,以E(Qi),E(Ci),E(Qi+Ci)以及E(WAFi)四种查询元素的构建方式,实现各类历史信息与检索过程的有效结合。 第四,以词激活力为理论基础,结合Session信息进行查询扩展,研究基于词激活力的Session检索模型的有效性。 针对上述研究点,本文进行了Session检索模型的分类实验设计及实现。实验结果表明,基于马尔可夫随机场的Session检索模型能够实现搜索优化的作用。
[Abstract]:With the explosive growth of Internet information, search engine plays a vital role in the search of network information. For mass data, the traditional search algorithm has some limitations. First of all, keyword-oriented search, the ability to build queries for users is high. Secondly, using the user's short query and the mass of information to match the correlation, the accuracy and recall rate are lower. Finally, the common search algorithm can not provide personalized retrieval services. In order to solve the above problems, this paper takes session information as an object to study the search optimization based on session process. Session process refers to the user in search process in order to meet their pre-set search requirements. A series of query terms are modified and interacted with search results, including click behavior on search results page, browsing time and so on. Based on session information, a session retrieval model based on Markov random field is proposed in this paper to achieve search optimization. The main research of this paper includes the following aspects. Firstly, based on Markov random field theory, the retrieval model of session process is constructed. Based on the analysis of user search behavior and the temporal characteristics of session process, a dynamic evolving session retrieval model is constructed. Secondly, on the basis of linguistic characteristic analysis, this paper studies the optimal function of word relevance hypothesis in session retrieval process. In this paper, we construct two kinds of session retrieval models, FISM and SDSM, based on word completely independent mode FIP and word sequence association schema SDP, and then discuss the influence of word relevance hypothesis in session retrieval process. Thirdly, based on the classification of session information, the influence of session information in retrieval is studied. In this paper, session information is divided into two categories: historical query HQ and historical click-net HCHC. By the definition of session retrieval model, four query elements, E (Qi) E (ci) E (Qi ci) and E (WAFi), are constructed to realize the effective combination of all kinds of historical information and retrieval process. Fourthly, the validity of session retrieval model based on word activation power is studied by combining session information with the theory of word activation. Aiming at the above research points, this paper designs and implements the classification experiment of session retrieval model. Experimental results show that the session retrieval model based on Markov random field can achieve search optimization.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前2条
1 李晓光;王大玲;于戈;;基于统计语言模型的信息检索[J];计算机科学;2005年08期
2 余慧佳;刘奕群;张敏;茹立云;马少平;;基于大规模日志分析的搜索引擎用户行为分析[J];中文信息学报;2007年01期
相关硕士学位论文 前1条
1 胡亦清;舆情系统中倾向性分析与实现[D];北京邮电大学;2012年
,本文编号:2099254
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2099254.html