基于本体和用户日志的查询扩展研究
发布时间:2018-05-03 19:12
本文选题:本体 + 查询扩展 ; 参考:《湖南大学》2013年硕士论文
【摘要】:随着因特网信息的爆炸式增长,用户如何从大量的信息中获取自己真正想要的信息变得越来越棘手。搜索引擎在一定程度上解决了用户查找有用信息的问题。但用户在使用搜索引擎时往往无法准确表达自己的查询意图,经常出现查询词使用不当或者查询词过短等问题导致搜索引擎查全率和查准率低下,无法返回有用信息。对用户查询进行扩展变得十分迫切。 查询扩展技术经历了几十年的发展,国内外的研究人员已提出多种查询扩展方法。然而这些常见方法在进行扩展时往往不能从语义层面理解用户输入,且因其扩展词的来源具有不确定性,容易加入查询无关词,造成“查询漂移”问题。本文结合领域本体和用户查询日志提出一种基于本体和用户日志的查询扩展算法。利用领域本体从语义层面扩展用户查询形成初始扩展概念集,结合用户查询日志利用词共现分析对初始扩展概念集进行二次筛选。主要内容如下: (1)阐述了课题的研究背景与意义,分析了当前查询扩展技术的研究进展与存在的不足、对课题相关的背景知识和相关理论作了介绍,为后文研究工作的开展奠定了理论基础。 (2)提出了一种基于本体的概念语义相似度计算公式,对候选扩展词进行语义相似度计算,从语义层面对用户查询进行扩展。 (3)提出了一种基于用户日志的词共现计算公式,,对初始扩展词进行词共现计算,以计算结果作为扩展词的词共现权值,结合扩展词的语义相似度权值和词共现权值进行二次筛选,从而避免初始扩展易出现的“查询漂移”问题。 (4)根据本文提出的基于本体和用户日志的查询扩展算法,结合国产软硬件售后服务跟踪系统的查询需求设计并实现了一个原型系统。介绍了系统的整体框架及各个组成模块。最后在该系统上进行了对比实验测试。实验结果表明,与传统的查询扩展方法相比较,本文方法在保障良好鲁棒性的同时,有效地提高了检索准确率。
[Abstract]:With the explosive growth of Internet information, it becomes more and more difficult for users to obtain the information they really want from a large amount of information. Search engine solves the problem of searching useful information to some extent. However, when users use search engines, they often can not express their query intention accurately. Problems such as improper use of query words or too short query words often lead to low recall and precision of search engines, which can not return useful information. It is urgent to extend user queries. Query extension technology has experienced decades of development, researchers at home and abroad have proposed a variety of query expansion methods. However, these common methods are often unable to understand user input from the semantic level, and because of the uncertainty of the source of the extension words, it is easy to add query independent words, resulting in the problem of "query drift". This paper presents an extended query algorithm based on domain ontology and user log. Domain ontology is used to extend user query from semantic level to form initial extended concept set. Combined with user query log, the initial extended concept set is filtered twice by word cooccurrence analysis. The main contents are as follows: 1) the research background and significance of the subject are expounded, the research progress and shortcomings of the current query extension technology are analyzed, and the related background knowledge and related theories are introduced, which lays a theoretical foundation for the later research work. (2) an ontology-based formula for calculating semantic similarity of concepts is proposed to calculate the semantic similarity of candidate extension words and to extend user queries from the semantic level. In this paper, a formula of word co-occurrence calculation based on user log is proposed, and the result is used as the word co-occurrence weight of the extended word. Combining the semantic similarity weights and co-occurrence weights of extended words, the problem of "query drift" which is easy to occur in initial extension can be avoided. 4) according to the query expansion algorithm based on ontology and user log proposed in this paper, a prototype system is designed and implemented according to the query requirements of domestic hardware and software after-sales service tracking system. The whole frame and each component module of the system are introduced. Finally, a comparative experiment was carried out on the system. The experimental results show that compared with the traditional query expansion method, this method not only guarantees good robustness, but also effectively improves the retrieval accuracy.
【学位授予单位】:湖南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 袁里驰;;一种基于互信息的词聚类算法[J];系统工程;2008年05期
2 王建勇,单松巍,雷鸣,谢正茂,李晓明;海量Web搜索引擎系统中用户行为的分布特征及其启示[J];中国科学E辑:技术科学;2001年04期
3 张超盟;李战怀;温宗臣;;局部上下文分析剪枝概念树的查询扩展[J];计算机工程;2009年14期
4 赵伟,戴新宇,尹存燕,陈家骏;一种规则与统计相结合的汉语分词方法[J];计算机应用研究;2004年03期
5 黄名选;严小卫;张师超;;查询扩展技术进展与展望[J];计算机应用与软件;2007年11期
6 余慧佳;刘奕群;张敏;茹立云;马少平;;基于大规模日志分析的搜索引擎用户行为分析[J];中文信息学报;2007年01期
7 陈
本文编号:1839733
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1839733.html