当前位置:主页 > 科技论文 > 软件论文 >

面向历史科目的问答技术研究

发布时间:2018-08-27 12:06
【摘要】:近年来,人工智能在许多方面取得了突破性的成就,因此越来越受到人们的关注。自动问答系统就是人工智能中的一个很重要的分支,也是自然语言处理领域中的一个值得长期研究的目标。现有的问答系统通常可以分为基于检索的问答系统和基于知识库的问答系统,这两种系统在回答问题时都需要事先储备一些相关的背景知识,只不过知识库中存储的都是结构化的易于理解的数据,而基于检索的问答系统通常包含大量的互联网文本,因此在回答问题时都会通过相关的查询产生若干的候选答案,接下来就需要计算每个候选答案与问题的相关程度,从而去除不相关的候选答案,最后得到问题的最佳答案。本文主要研究了面向历史科目的相关问答技术,包括问题分类、问题成分抽取、以及对问题候选答案的置信度排序问题。在得到一个问题之后,首先需要对问题进行分析以构造相关的查询,然后经过查询得到若干的候选段落,最后对候选段落中的句子进行置信度排序从而得到简短、准确的问题答案。本文尝试将深度学习的方法应用到问题分类、问题成分抽取和答案置信度排序中,具体研究内容如下:1.本文建立了针对历史科目的问题分类语料集和问题成分抽取语料集,将历史材料题进行分类并识别出问题中的关键要素。另外,本文建立了用于历史科目答案置信度排序的数据集。2.构建了基于深度学习的问题分类模型,并且使用了传统方法SVM与其进行对比。实验结果表明,深度学习法明显优于传统的方法,其中CNN模型取得了最佳的效果,达到了91.08%的Micro-F1值和86.80%的Macro-F1值。3.使用CRF模型和LSTM-CRF模型分别对问题进行了问题成分抽取实验。实验结果表明,传统的CRF模型在小规模语料的情况下效果是优于深度学习方法的,达到了88.51%的F1值。4.构建了基于深度学习的答案置信度排序算法,讨论了在使用CNN、LSTM在答案选择上的效果,实验表明,LSTM模型优于CNN模型,并且本文基于不同置信度计算方法以及使用不同的损失函数对答案置信度计算的影响进行了讨论,并进一步提出了调和余弦相似度和欧几里得距离的置信度计算方法,实验结果表明,使用调和后的置信度计算方法和合页损失函数取得了最佳的效果,其中MAP和MRR值分别为0.4320和0.6120。
[Abstract]:In recent years, artificial intelligence has made breakthrough achievements in many aspects, so people pay more and more attention to it. Automatic question answering system is a very important branch of artificial intelligence, and it is also a goal worthy of long-term study in the field of natural language processing. The existing question-and-answer systems are usually divided into search-based question-and-answer systems and knowledge-based question-and-answer systems, both of which require prior storage of relevant background knowledge when answering questions. However, all the data stored in the knowledge base is structured and easy to understand, and the search-based question-and-answer system usually contains a large amount of Internet text. Therefore, when answering questions, a number of candidate answers are generated through related queries. Next, we need to calculate the correlation between each candidate and the question, so as to remove the irrelevant candidate answer and finally get the best answer to the question. This paper mainly studies the question and answer techniques for historical subjects, including question classification, problem component extraction, and confidence ranking of candidate answers. After getting a question, we first need to analyze the problem to construct the related query, then we can get a number of candidate paragraphs through the query, and finally, we can sort the sentences in the candidate paragraphs to get a brief conclusion. An accurate answer to a question. This paper attempts to apply the method of in-depth learning to the classification of problems, the extraction of problem components and the ranking of confidence in the answers. The specific contents of this study are as follows: 1. In this paper, the problem classification corpus and the problem component extraction data set are established, and the historical material questions are classified and the key elements of the problem are identified. In addition, this paper establishes a dataset. 2. 2. A problem classification model based on deep learning is constructed, and the traditional method SVM is used to compare it. The experimental results show that the depth learning method is superior to the traditional method, and the CNN model has the best effect, reaching 91.08% Micro-F1 value and 86.80% Macro-F1 value .3. CRF model and LSTM-CRF model are used to extract the components of the problem. The experimental results show that the traditional CRF model is superior to the depth learning method in the case of small data, reaching 88.51% of F1 value. 4. An answer confidence sorting algorithm based on deep learning is constructed, and the effect of using CNN,LSTM in answer selection is discussed. The experiment shows that the LSTM model is superior to the CNN model. Based on different confidence calculation methods and different loss functions, this paper discusses the influence of different loss functions on the calculation of the confidence degree of the answer, and further proposes a method to calculate the confidence degree of harmonic cosine similarity and Euclidean distance. The experimental results show that the best results are obtained by using the concatenated confidence calculation method and the hinge loss function. The MAP and MRR values are 0.4320 and 0.6120 respectively.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP18

【相似文献】

相关期刊论文 前10条

1 刘小明;樊孝忠;李方方;;一种结合本体和焦点的问题分类方法[J];北京理工大学学报;2012年05期

2 槰起;;不一定,

本文编号:2207227


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2207227.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e63c4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com