当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于领域知识的自动答题方法研究

发布时间:2018-05-07 11:23

  本文选题:自动答题 + 翻译模型 ; 参考:《哈尔滨工业大学》2016年硕士论文


【摘要】:在移动互联网广泛普及的年代,人们获取信息的方式越来越便捷,对信息的需求也越来越大。为了满足不同层次的人们对于不同领域的信息需求,搜索引擎在移动互联网中面临着巨大的挑战。问答技术的日渐完善在很大程度上克服了搜索引擎显现出的弊端,使人们拥有了更加自然的人机交互方式。问答系统可以较为准确地理解人们自然语言形式的问题,并利用知识库检索即时地返回言简意赅的答案,有效地满足了人们的需求。随着人工智能、自然语言处理等相关技术的进步,针对不同的数据形态也衍生出了不同种类的问答系统。近几年,国内外诸多科研机构开始致力于类人智能技术的研究,将问答相关技术应用到考试领域。本课题主要面向我国高考文综试题历史部分,利用自然语言处理、问答系统等技术搭建一个能够求解高考历史简答题的自动答题系统。本文的主要研究内容包括:数据预处理与平台搭建。本文对历年高考历史真题进行了抽样分析,对题目类型和解题难点进行了归纳总结;依据各类题目的主要问题,完成了领域知识库的数据采集和存储,搭建了历史检索系统,确保了答题系统的正常运行;针对历史题目中可能存在文言材料的问题,通过互联网渠道收集了一定规模的平行语料,完成了文言文判别模型和文言文翻译模型的训练。基于知识库的候选答案发现。为了能够准确地从知识库中得到与题目相关的文档,本文在对题目进行关键词提取、信息检索、置信度计算等传统步骤之后,针对历史简答题的特殊性,尝试了基于卷积神经网络的问答匹配方法,将候选答案发现问题转化为序列预测问题,通过卷积神经网络模型做到更深层次的匹配。基于多文档的答案生成。利用知识库检索得到了包含答案要点的候选文档集合,为了从中提出了简洁、准确、符合题意的答案,本文借鉴了多文档摘要的算法思想,通过对文档集合中语句进行文本聚类生成多个簇,再利用多语句压缩方法对每个簇进行信息抽取,生成题目答案。为了便于对系统性能进行实验分析,本文建立了统一的人工评分标准,在历年高考真题上进行测试,证明了系统的有效性。
[Abstract]:In the era of widespread mobile Internet, people get information more and more convenient, and the demand for information is growing. In order to meet the information needs of people at different levels, search engines are facing great challenges in mobile Internet. The improvement of Q & A technology to a great extent overcomes the disadvantages of search engine and makes people have more natural human-computer interaction. The question answering system can understand the question of people's natural language form more accurately, and use the knowledge base to retrieve the concise and concise answers in real time, which can meet people's demand effectively. With the development of artificial intelligence, natural language processing and other related technologies, different kinds of Q & A systems are derived for different data forms. In recent years, many scientific research institutions at home and abroad began to devote themselves to the research of humanoid intelligence technology, and applied the question and answer related technology to the field of examination. This paper mainly aims at the history part of the comprehensive examination of the college entrance examination in our country. It uses natural language processing, question answering system and other techniques to build an automatic answer system which can solve the history brief questions of the college entrance examination. The main research contents of this paper include: data preprocessing and platform building. This paper has carried on the sampling analysis to the history question of the college entrance examination over the years, has summarized the question type and the difficult problem, has completed the domain knowledge base data collection and the storage according to each kind of topic main question, has set up the history retrieval system. To ensure the normal operation of the answer system, to solve the problem of classical Chinese materials, we collect a certain scale of parallel corpus through the Internet channel, and complete the training of classical Chinese discriminant model and classical Chinese translation model. Candidate answer discovery based on knowledge base. In order to get the relevant documents from the knowledge base accurately, after the traditional steps such as keyword extraction, information retrieval, confidence calculation and so on, this paper aims at the particularity of the history brief answer. In this paper, a question and answer matching method based on convolution neural network is tried. The problem of finding candidate answers is transformed into a sequence prediction problem, and a deeper matching is achieved through the convolution neural network model. Answer generation based on multiple documents. The candidate document set containing the key points of the answer is obtained by searching the knowledge base. In order to put forward a succinct, accurate and consistent answer to the question meaning, this paper draws lessons from the algorithm of multi-document summary. Several clusters are generated by text clustering of statements in the document set, and then the information of each cluster is extracted by the method of multi-sentence compression to generate the answer to the questions. In order to carry on the experiment analysis to the system performance, this paper establishes the unified manual mark standard, carries on the test in the past years college entrance examination real question, has proved the system to be effective.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1


本文编号:1856725

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1856725.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户bb04a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com