当前位置:主页 > 科技论文 > 软件论文 >

基于知识库的自然语言问答方法研究

发布时间:2018-04-16 03:24

  本文选题:知识库问答 + 词向量 ; 参考:《中国科学技术大学》2017年硕士论文


【摘要】:基于知识库的自然语言问答指的是针对以自然语言形式给出的问题,利用结构化的知识库给出答案,它是自然语言处理的重要研究方向之一。知识库问答的主要方法可以分为基于信息提取的方法、基于语义解析的方法和基于向量空间建模的方法三类,其中的关键技术包括知识的抽取和表示、用户问句的语义表征和基于知识库的答案生成等。受到问句语义表征准确性、问答对训练数据规模等因素的影响,现阶段知识库问答系统的性能仍有待提升。此外,开源的大规模开放领域中文知识库较为缺乏,这也制约了面向中文的知识库问答技术的研究开展。本文围绕基于知识库的自然语言问答任务,从问句语义表征、训练数据准备和中文知识库构建等多个方面开展研究工作,主要研究内容包括面向知识库问答中复述问句评分的词向量构建方法、结合神经网络问句生成的知识库问答方法以及中文知识库构建中的知识融合方法。传统词向量通过与具体任务无关的无监督训练方法得到,用于知识库问答中的复述问句评分时无法体现句子级的语义约束关系。因此,本文提出了一种基于复述知识约束的词向量训练方法。该方法在词向量训练过程中引入句子级的语义约束信息,在不改变句子语义合成方法的前提下,通过优化单词层面的语义向量,来改善句子层面的语义表征,最后达到提升复述问句评分以及知识库问答系统回答问题的准确度的效果。现有基于向量空间建模的知识库问答方法依赖训练数据,而人工生成大规模的问答对数据较为困难。本章针对以上问题将基于编码器-解码器神经网络模型的问句生成方法引入知识库问答系统构建,通过构建问句生成模型实现由知识库中三元组自动生成问句,用于知识库问答的模型训练。实验结果表明使用模型生成问句相对传统模版生成问句,有效改善了知识库问答系统的准确率。最后,本论文介绍一种基于知识融合的中文知识库构建方法。该方法首先从百度百科网页的信息框中抽取信息构建初始知识库,然后采用基于链接词信息的实体对齐和基于Jaccard系数的属性映射方法,实现初始知识库与现有Freebase知识库的融合。通过构建人物、地理等部分领域的中文知识库,验证了以上方法在已有本体库基础上实现知识库扩充的有效性。
[Abstract]:The question and answer of natural language based on knowledge base refers to the question given in the form of natural language. It is one of the important research directions of natural language processing by using the structured knowledge base to give the answer.The main methods of knowledge base question and answer can be divided into three kinds: one is based on information extraction, the other is based on semantic analysis and vector space modeling. The key technologies include knowledge extraction and representation.The semantic representation of user question and the answer generation based on knowledge base.Due to the accuracy of semantic representation of question sentences and the effect of question answering on the scale of training data, the performance of the knowledge base question answering system still needs to be improved.In addition, the lack of Chinese knowledge base in open-source and large-scale open field also restricts the research of Chinese-oriented knowledge base question and answer technology.This paper focuses on the question and answer task of natural language based on knowledge base, including the semantic representation of question sentence, the preparation of training data and the construction of Chinese knowledge base, etc.The main contents of this paper include the word vector construction method which is oriented to the scoring of quizzes in the knowledge base, the knowledge base question answering method combined with the neural network question generation method and the knowledge fusion method in the Chinese knowledge base construction.The traditional word vector is obtained by unsupervised training method which is independent of the specific task, and can not reflect the semantic constraint relationship of sentence level when used in the scoring of question retelling in the knowledge base question answering.Therefore, this paper proposes a word vector training method based on retelling knowledge constraints.This method introduces sentence level semantic constraint information in the process of word vector training, and improves the semantic representation of sentence level by optimizing the semantic vector of word level without changing the sentence semantic synthesis method.Finally, the accuracy of answering questions in question answering system is improved.The existing knowledge base question-and-answer methods based on vector space modeling rely on training data, but it is difficult to generate large-scale question and answer data manually.In this chapter, the question generation method based on encoder and decoder neural network model is introduced into the question answering system of knowledge base, and the question generation model is constructed to generate question sentences automatically by triples in knowledge base.Model training for knowledge Base questions and answers.The experimental results show that using the model to generate questions is more effective than the traditional template to generate questions, which can effectively improve the accuracy of the question answering system of knowledge base.Finally, this paper introduces a knowledge fusion based Chinese knowledge base construction method.In this method, the initial knowledge base is constructed by extracting information from the information box of Baidu encyclopedia page, and then the method of entity alignment based on link word information and attribute mapping method based on Jaccard coefficient is adopted to realize the fusion of initial knowledge base and existing Freebase knowledge base.By constructing the Chinese knowledge base of people, geography and other fields, the validity of the above methods to realize the expansion of the knowledge base based on the existing ontology library is verified.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【相似文献】

相关硕士学位论文 前1条

1 詹晨迪;基于知识库的自然语言问答方法研究[D];中国科学技术大学;2017年



本文编号:1757091

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1757091.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户ff503***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com