基于百度百科的查询意图分类

发布时间：2018-11-24 11:14

【摘要】：万维网中的多数网页都是基于HTML语言编写的，随着网页数目剧增，搜索引擎的搜索难度增加。如果搜索引擎能自动识别查询意图，将返回结果进行意图分类，用户在意图类别中找到相应的查询结果，这样的查询结果会大幅度地提高了用户满意度。实际查询中，用户输入查询词可能包含多个查询意图，搜索引擎通过分析用户浏览行为可能预测用户的查询意图。如果搜索引擎能自动识别用户查询意图，并对查询结果进行有效排序，，良好的返回结果对用户是非常有用的。因此，搜索引擎主动预测用户查询意图是未来搜索行为的关键所在。如果用户输入查询词较短且查询信息需求不足，通用搜索引擎返回的查询结果大多数不符合用户的查询需求。针对查询结果不准确的问题，搜索引擎能否将查询结果按查询意图分类？然而，对于查询意图分类问题也有着巨大挑战，其中包括：意图表示、意图范围、句子表示三方面内容。本文主要方法是基于百度百科的查询意图分类，百科中包含有很多概念和类别，而且绝大多数概念都有特定领域的关键词，每一个概念都是由一篇文章组成。用户输入新查询词与百科中概念进行句子相似度计算，在最相似类别下进行随机游走，最终得到用户满意的查询结果。实验结果表明，本文提出的方法的实验结果良好。
[Abstract]:Most web pages in the World wide Web are based on the HTML language. With the number of web pages increasing dramatically, search engines become more difficult to search. If the search engine can automatically identify the query intention, the result will be returned to classify the intention, and the user will find the corresponding query results in the intention category, which will greatly improve the user satisfaction. In the actual query, the user input query words may contain multiple query intentions, and the search engine may predict the user's query intention by analyzing the user's browsing behavior. If the search engine can automatically identify the user's query intention and sort the query results effectively, a good return result is very useful to the user. Therefore, it is the key of future search behavior that the search engine actively predicts the user's query intention. If the user input query term is short and the query information requirement is insufficient, most of the query results returned by the general search engine do not meet the query requirements of the user. In view of the inaccuracy of the query results, can search engine classify the query results according to the query intention? However, there is also a great challenge to query the classification of intention, which includes intention representation, intention scope and sentence representation. The main method of this paper is based on Baidu Encyclopedia query intention classification, encyclopedia contains many concepts and categories, and most of the concepts have specific domain keywords, each concept is composed of an article. The users input the new query words and the concepts in encyclopedia to calculate the sentence similarity, walk randomly under the most similar category, and finally get the satisfactory query results. The experimental results show that the proposed method has good experimental results.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP391.3

【共引文献】