查询语句的概念分析及其在检索中的应用
发布时间:2018-03-31 21:19
本文选题:查询语句 切入点:概念分析 出处:《上海交通大学》2013年硕士论文
【摘要】:近年来,随着计算机技术的发展和互联网的普及,Internet上的资源以指数级迅速增长,这不仅为我们提供了博大的信息资源,也伴随着信息爆炸的问题。面对纷繁复杂的网络资源,如何从海量的信息中获取自己所需的信息,也就是检索系统如何从海量文档中返回出最符合用户需求的候选文档,成为了现在最关注的问题。 目前的信息检索系统只能提供给使用者有限的帮助,检索的准确率低下,大量的信息不仅不能给用户提供帮助,反而带来了不小的困扰。这个问题的症结在于现有的大部分检索系统采用的是如布尔模型等的“离散型”模型,用户的需求和文档被表示成离散的、无关的字串,从而丧失了它们概念上的完整性,带来了新的噪声。一个可行的方案是将自然语言理解的手段引入到检索中,通过深层次的语义分析来提高检索的准确率。具体的说,就是应用语义分析的方法标引需求和文档,标引的基本单位不再是字串,而是完整的概念。这样就构建了需求和文档中概念之间的关系。 本文研究的是汉语用户需求的概念分析,这是中文概念检索系统必不可少的组成部分。需求分析是检索过程中的第一步,其目的是还原用户的检索意图以指导进一步的检索工作。因此需求分析是检索系统的首要任务,其质量直接影响了整个检索系统的性能。需求分析,跟文本文档的分析存在较大的区别,其目的除了将用户查询语句表示成概念信息;更重要的是能准确的刻画用户脑海中的检索概念,其依据则是模糊的用户需求表达式。 本论文引入概念新思想,在概念层次上,利用语义概念图模型,处理中文查询语句,再将其转化为语义概念图,,把用户输入的关键词通过它们之间的语义关系联结成为内涵完整的图的形式,使得在整个语义检索过程中不丢失其语义概念信息,从而可以根据用户需求的完整概念内涵,对返回的网页结果进行相关性的衡量,达到提高准确率的效果。 本文在用户需求概念分析上提出了一种新的尝试和方法,从内涵概念图层次上分析用户的真正意图,特别是在处理疑问句需求时,通过提取查询语句的焦点信息,并用其替换句子中的疑问词,构建出表达查询语句内涵语义信息的概念图。该方法从中文概念内涵的角度,分析用户需求,较为完整、准确地还原用户的检索意图,以指导接下去的检索工作,从而提高了检索系统的准确度。这对于中文搜索引擎的新开发,提供了有效的技术支持。
[Abstract]:In recent years, with the development of computer technology and the popularization of Internet, the resources on Internet are increasing exponentially, which not only provides us with extensive information resources, but also with the problem of information explosion. How to obtain the necessary information from the massive information, that is, how to retrieve the candidate documents from the massive documents, which is the most suitable for users' needs, has become the most concerned issue. The current information retrieval system can only provide users with limited help, the retrieval accuracy is low, a large amount of information not only can not provide users with help, The crux of the problem is that most of the existing retrieval systems use "discrete" models, such as Boolean models, in which users' needs and documents are represented as discrete, unrelated strings. One feasible solution is to introduce natural language understanding into retrieval and improve the retrieval accuracy through deep semantic analysis. The basic unit of indexing is no longer a string, but a complete concept, which constructs the relationship between the requirements and the concepts in the document. This paper studies the conceptual analysis of Chinese users' needs, which is an essential part of the Chinese concept retrieval system. The requirement analysis is the first step in the retrieval process. The purpose is to restore the user's retrieval intention to guide further retrieval work. Therefore, requirement analysis is the primary task of the retrieval system, and its quality directly affects the performance of the whole retrieval system. There is a great difference from the analysis of text documents. Its purpose is not only to express the user query as conceptual information, but also to accurately depict the retrieval concept in the user's mind, which is based on the fuzzy user demand expression. In this paper, a new concept is introduced. At the conceptual level, the semantic concept map model is used to deal with Chinese query sentences, and then the semantic concept map is transformed into the semantic concept map. The key words input by the user are connected into the form of a graph with complete connotation through the semantic relation between them, so that the semantic conceptual information is not lost in the whole semantic retrieval process, so that the complete conceptual connotation of the user can be obtained according to the needs of the user. The relevance of the returned page results is measured to improve the accuracy of the results. In this paper, a new attempt and method is put forward to analyze the real intention of the user from the level of intension concept map, especially when dealing with the requirement of interrogative sentence, by extracting the focus information of the query sentence. By replacing the interrogative words in the sentence, a concept map is constructed to express the semantic information of the connotation of the query sentence. From the angle of the Chinese concept connotation, the method analyzes the user's demand, and restores the user's retrieval intention completely and accurately. In order to guide the next retrieval work, the accuracy of the retrieval system is improved, which provides an effective technical support for the new development of the Chinese search engine.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前4条
1 蒋绍愚;;汉语词义和词汇系统的历史演变初探——以“投”为例[J];北京大学学报(哲学社会科学版);2006年04期
2 裴炳镇,陈晓明,胡熠,陆汝占;一种建立中文概念分类关系的新算法[J];计算机工程与应用;2004年36期
3 张华平,刘群;基于N-最短路径方法的中文词语粗分模型[J];中文信息学报;2002年05期
4 文勖;张宇;刘挺;马金山;;基于句法结构分析的中文问题分类[J];中文信息学报;2006年02期
相关博士学位论文 前2条
1 段建勇;多词表达抽取及其应用[D];上海交通大学;2007年
2 胡熠;面向信息检索的文本内容分析[D];上海交通大学;2007年
本文编号:1692485
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1692485.html