Query语义依存分析技术研究

发布时间：2018-05-24 10:15

本文选题：语义依存分析 + 语义搜索　；参考：《哈尔滨工业大学》2012年硕士论文

【摘要】：互联网的飞速发展使得信息以前所未有的速度产生和传播，面对信息呈指数式增长、垃圾信息泛滥成灾的困境，搜索引擎如何找到对用户真正有用的信息遇到了很大的挑战。在传统的搜索引擎中，用户输入查询（query），搜索引擎返回一个很长的网页列表。它不知道用户在问什么，不知道用户想找什么，只是通过基于关键字匹配的检索方式，，把包含有关键词的网页找到；再通过网页排序的算法，将结果列表进行排序以后展示给用户，用户需要在很长的列表中，自己筛选出真正想要的信息。Query语义依存分析技术首先可以改善传统搜索引擎中的网页排序，它能够对query进行深层语义理解，从而更准确的理解用户的需求，减轻用户筛选信息的负担。另一方面，相对于传统搜索引擎，语义搜索近来受到工业界和学术界的广泛关注。和传统搜索引擎给出信息列表不同，语义搜索将所有信息组织成一个庞大的知识库，面对用户的query，它直接从知识库中检索并返回答案。从而用户省去了自己筛选信息的步骤，更快速更直接地达到搜索的目的。Query语义依存分析技术可以帮助语义搜索引擎更深刻的理解用户需求，更准确的在知识库中进行答案的查找。除此之外，query语义依存分析技术还在自动问答、智能个人助手、信息检索、信息抽取等方向有着广阔的应用前景。本文提出了基于规则和基于统计的两个语义依存分析技术，主要研究内容包括：（1）Query语义依存分析和普通句子上的语义依存分析的异同。相对普通句子来说，query具有长度较短且结构松散的特点，因而和普通句子上的语义依存分析技术有很大的差别。（2）Query语义依存分析的依存关系体系的确定，即根据query的特点，以及应用的需求，确定一个合适的依存关系体系。依存关系体系的确定，首先要考虑体系的完整性，是否能把主要的语义现象覆盖住。其次也要考虑技术上的成本、应用的需求等。本文确定了五类语义依存关系，分别是属性、限定、施事、受事、需求。其中限定关系又分了六个子类别，分别是时间限定、地点限定、数字限定、型号限定、疑问限定、否定限定。（3）针对六类特殊限定定义明确简单的特点，提出了基于规则的query语义依存分析技术，包括规则的定义、规则的编制、规则的应用。（4）将语义依存分析问题转换为分类问题，提出了基于统计的query语义依存分析技术，包括语义资源的挖掘、分类特征的设计和选择。最终通过对比和实验说明了规则和统计两种方法的有效性。
[Abstract]:With the rapid development of the Internet, information is produced and spread at an unprecedented speed. In the face of the exponential growth of information and the flood of junk information, the search engine has encountered a great challenge how to find the information that is really useful to users. In traditional search engines, users type queries and search engines return a long list of pages. It doesn't know what the user is asking or what the user is looking for. It just finds the page with the keywords in the search method based on keyword matching. After sorting the result list to the user, the user needs to filter out the information. Query semantic dependency analysis technology can improve the web page sort in the traditional search engine. It can deeply understand the semantics of query, so as to understand the needs of users more accurately and lighten the burden of filtering information. On the other hand, compared with traditional search engines, semantic search has attracted extensive attention from industry and academia recently. Different from the traditional search engine, semantic search organizes all the information into a huge knowledge base. In the face of the user's query, it directly retrieves the answers from the knowledge base and returns the answer. Thus, users can save themselves the steps of filtering information, and achieve the purpose of searching more quickly and directly. Query semantic dependency analysis technology can help semantic search engines to understand user needs more deeply. More accurate search for answers in the knowledge base. In addition, query semantic dependency analysis technology also has a broad application prospect in automatic question answering, intelligent personal assistant, information retrieval, information extraction and so on. In this paper, two semantic dependency analysis techniques based on rules and statistics are proposed. The main research contents are as follows: There are similarities and differences between semantic dependency analysis and semantic dependency analysis in general sentences. Compared with ordinary sentences, the query is short in length and loose in structure, so it is quite different from the semantic dependency analysis techniques in common sentences. According to the characteristics of query and the requirements of its application, a suitable dependency system is determined. In determining the dependency system, the integrity of the system should be considered first, and whether the main semantic phenomena can be covered. Second, we should also consider the technical costs, application requirements and so on. In this paper, five kinds of semantic dependencies are defined, which are attribute, limitation, agent, patient and requirement. The limited relation is divided into six subcategories, namely, time limit, place limit, number limit, model limit, question limit and negative limitation. 3) aiming at the clear and simple characteristics of six kinds of special defined definitions, a rule-based query semantic dependency analysis technique is proposed, including the definition of rules, the compilation of rules, and the application of rules. 4) the semantic dependency analysis problem is transformed into the classification problem, and the query semantic dependency analysis technology based on statistics is proposed, including the mining of semantic resources, the design and selection of classification features. Finally, the effectiveness of the two methods is proved by comparison and experiment.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP391.1

【参考文献】