当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于互联网的自动问答答案抽取的研究

发布时间:2018-01-28 22:42

  本文关键词: 自动问答 答案抽取 图模型 排序学习 词表示 复述 出处:《天津大学》2014年博士论文 论文类型:学位论文


【摘要】:基于互联网的自动问答基于搜索引擎返回的结果回答自然语言问题,可充分利用搜索引擎高质量的结果,省去存储大量文档的必要。答案抽取是从检索得到的文本中生成答案,包含候选生成和候选排序。由于搜索片段具有噪音多、句子结构不完整等特点,使得基于搜索结果的答案抽取和正规文本上的答案抽取有很大不同,传统方法在该任务上受到影响,性能下降。本博士论文讨论如何针对搜索结果的问题优化答案抽取,包括以下课题:针对一些搜索结果中正确答案出现的特征不明显的问题,本文提出了基于段落图模型的候选生成方法,某个段落中的候选生成可以接收到来自其他段落中的信息、并帮助提高当前段落中生成候选的结果。实验证明,该模型可有效提高候选生成的准确率和召回率。对搜索结果中噪音多、句法结构不完整的问题,本文提出了剪枝排序融合整合不同候选生成方法,并基于排序学习进行候选重排序。该框架可以有效减轻搜索结果中的噪音的影响。实验证明,本文中的排序方法在基于搜索结果中的候选排序任务上超过了目前最好的算法。针对搜索结果表达和原问题之间有较大差异、在计算相似度时可扩展性差的问题,本文提出了两种基于词表示的问题和候选答案相似度的计算方法,包括搜索结果和问题之间的文本相似度和候选答案和答案类型之间的语义相似度。实验证明,使用本文提出的两种基于词表示计算的相似度可以有效提高候选排序的结果。针对搜索结果和问题间存在表述差异这一问题,本文探讨复述生成的应用。本文提出了基于联合学习的对偶机器翻译系统生成复述的方法以及复述生成的评价指标。使用该方法生成问题的复述表示,可增加复述表示的差异性,减轻计算相似度时不同表示之间差异带来的影响。实验证明,使用本文提出的复述生成方法可提高候选排序结果。其中,本文使用基于段落图模型方法进行候选生成,然后结合其他候选生成方法、基于排序学习进行候选排序。在此基础上,使用基于词向量、复述计算的相似度特征提高排序结果。通过本文的研究,减轻了基于搜索结果生成答案时,搜索片段的噪音等问题对问答结果的影响,使得基于互联网的自动问答的答案抽取在不依赖句法、语义相似度的情况下,获得超过目前最好答案抽取方法的结果。
[Abstract]:Internet-based automatic Q & A based on the results returned by search engines to answer natural language questions, can make full use of high quality search engine results. The answer extraction is to generate the answer from the retrieved text, including candidate generation and candidate sorting. Because the search segment is noisy, sentence structure is incomplete and so on. As a result, the search results based answer extraction and the formal text answer extraction are very different, the traditional method is affected on the task. Performance degradation. This Ph. D. thesis discusses how to optimize the answer extraction for search results, including the following topics: for some of the search results the correct answers appear in the characteristics of the problem is not obvious. This paper proposes a candidate generation method based on paragraph graph model. Candidate generation in one paragraph can receive information from other paragraphs and help improve the result of candidate generation in current paragraph. This model can effectively improve the accuracy and recall rate of candidate generation. For the problems of noisy search results and incomplete syntactic structure, this paper proposes different candidate generation methods of pruning sorting fusion integration. And based on sorting learning candidate reordering. This framework can effectively reduce the impact of the noise in the search results. Experimental results show that. The sorting method in this paper is superior to the best algorithm in candidate sorting tasks based on search results. There are great differences between the expression of search results and the original problem. In this paper, we propose two methods based on word representation and candidate answer similarity. Including the text similarity between search results and questions and the semantic similarity between candidate answers and answer types. Using the two kinds of similarity based on word representation in this paper, we can improve the result of candidate ranking effectively. In order to solve the problem of the difference between the search results and the problem, we can solve the problem of the difference between the search results and the problem. This paper discusses the application of repetition generation. In this paper, a method of generating retelling in dual machine translation system based on joint learning and its evaluation index are proposed. The method is used to generate the restatement representation of the problem. It can increase the difference of repeat representation and reduce the influence of different representations when calculating similarity. The experiment proves that the method proposed in this paper can improve the result of candidate ranking. This paper uses the method of paragraph graph model for candidate generation, then combines other candidate generation methods, based on sort learning to carry out candidate sorting. On this basis, we use word vector. The similarity features of the retelling computation improve the ranking results. Through the research in this paper, the effects of the noise of the search segments on the results of question and answer are alleviated when the answers are generated from the search results. It makes the automatic question and answer extraction based on the Internet obtain more results than the best method of answer extraction without syntactic and semantic similarity.
【学位授予单位】:天津大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TP391.1


本文编号:1471749

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1471749.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户cb6e5***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com