基于依存关系网络的查询扩展研究
发布时间:2018-06-05 19:10
本文选题:实时查询扩展 + 依存关系 ; 参考:《北京邮电大学》2013年硕士论文
【摘要】:随着互联网信息规模的飞速增长,搜索引擎成为了人们快速获取网络信息所必不可少的工具。用户只需向搜索引擎输入查询词,便会得到相应的搜索结果。但是,查询输入通常只有几个词,且常常有歧义,所以有时并不能准确地反映用户的查询意图,导致返回无关信息。 实时查询扩展是一种对用户输入进行扩充以便更准确的体现用户查询意图的技术。基于向用户实时的推荐新查询词,它可以补全用户查询句,减少用户的输入量,同时消解意图上的歧义。传统的实时查询扩展技术大多是利用查询日志,基于关键词频率进行查询词补全和查询词推荐。 本文首先提出了一种基于“动词+修饰词+名词”依存关系的查询意图表示方法,并基于对总大小为1.15G的915600篇文档的大规模语料分析,构造了一个超过5万个节点的依存关系网络。 然后,提出了一个利用上述大规模依存关系网络为用户进行实时查询扩展的方法。实验表明,该方法的扩展成功率达到84%,并能减少用户查询时所需的输入量。 最后,实现了一个具有完整检索功能的实时查询扩展系统。该系统综合利用上述的查询词扩展技术和基于字符串的词语补全技术来进行实时查询扩展。系统评测表明,该系统可以减少63.75%的用户操作。而且在经过扩展之后,检索结果的nDCG评分达到88.95%。与微软的Bing搜索引擎的比较表明,本系统在用户输入的词序不同时有更稳定的查询扩展能力。
[Abstract]:With the rapid growth of the Internet information scale, the search engine has become a necessary tool for people to obtain the network information quickly. The user only needs to input query words to the search engine, and the search results will be obtained. However, the query input usually has only a few words and often has ambiguity, so sometimes it can not accurately reflect the user. The query intention leads to the return of unrelated information.
Real-time query extension is a technology to expand user's input to more accurately reflect the user's query intention. Based on the recommendation of new query words to the user real-time, it can complement the user query sentence, reduce the user's input and disambiguate the intention. The traditional real time query extension technology is mostly using the query log, Keyword completion and query recommendation based on keyword frequency.
In this paper, a query intention representation method based on the dependency relationship of verb + modifier + noun is proposed. Based on the large corpus analysis of 915600 documents with total size of 1.15G, a dependency network with more than 50 thousand nodes is constructed.
Then, a method of real-time query expansion for users by using the large scale dependency network above is proposed. The experiment shows that the expansion rate of the method is 84%, and the input amount required by the user can be reduced.
Finally, a real-time query extension system with complete retrieval function is implemented. The system uses the above query word extension technology and the string based word complement technology to carry out real-time query expansion. The system evaluation shows that the system can reduce 63.75% of the user operation. And after the expansion, the retrieval result is nD Compared with Microsoft's Bing search engine, the CG score reaches 88.95%., which shows that the system has more stable query expansion ability when the user input word order is different.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前1条
1 唐怡;周昌乐;练睿婷;;基于HowNet的中文语义依存分析[J];心智与计算;2010年02期
,本文编号:1983110
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1983110.html