当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于实体类百科知识的问句自动生成系统

发布时间:2018-05-04 01:15

  本文选题:交互式问答 + 问句生成 ; 参考:《哈尔滨工业大学》2012年硕士论文


【摘要】:随着网络信息的爆炸式的增长,各种信息充斥着整个网络环境。人们现在已经习惯于去网络上搜寻一些解决问题的方法。当用户并不是十分熟悉一些搜索技巧的时候,他们往往需要花费很多的时间去筛选搜索引擎返回的结果。交互式问答系统的诞生有效的解决了前面提到的信息烦杂的问题。问答系统采用自然语言处理的方法将用户提交的问题进行分析,获取相关答案然后返回给用户。 问句自动生成将会在缺少人机交互的情况下为交互式问答系统提供问答对。这些问答对可以根据系统需要限定在某一领域内存在也可以作为通用领域问答对。目前针对英文问句自动生成技术已经有了很大的发展,这些技术已经被应用到问答系统,对话系统以及教学系统等。中文问句自动生成的研究才刚刚起步,有很多的问题需要科研人员来解决。本课题是针对中文问答系统语料库不完善这个问题,提出通过自动的生成中文问答对来对问答系统语料库进行补充。 本课题研究内容如下所示: 1.中文问句自动生成系统 当前,问句自动生成系统不能像人那样直接理解一句话的意思。因此问句生成前的信息预处理是每个问题生成系统所必需进行的。本课题采取分布式设计,将中文信息提取分成两大部分共七类的信息由不同的功能单元机进行处理,最终处理后的结果返回给问句生成系统。本课题设计了一种基于句法信息与句式信息相结合的问句生成算法,根据他们的信息生成特殊疑问句或者是因果关系疑问句。 2.生成问句的自动分类 本课题提出一种根据对命名实体分类与部分模板匹配的算法,将生成6类问句。这六类的问句分别是人名类问句,地名类问句,,时间表达式类问句,机构名称类问句,定义类问句和因果关系类问句。 3.系统的评测与改进 英文问题生成系统定义了一系列的评测标准。本课题将借鉴其中某些标准来对系统进行评测。同时邀请部分用户参与系统测试,根据他们的反馈情况有针对性的进行系统的完善和补充。
[Abstract]:With the explosive growth of network information, all kinds of information are flooded with the whole network environment. People are now used to searching the Internet for solutions to problems. When users are not very familiar with some search techniques, they often spend a lot of time to filter the results returned by search engines. The birth of interactive Q & A system effectively solves the problem of information complexity mentioned above. The question answering system uses natural language processing method to analyze the questions submitted by the user, obtain the relevant answers and return them to the user. Automatic question generation will provide a question-answer pair for an interactive question-answering system in the absence of human-computer interaction. These question-and-answer pairs can be limited to exist in a domain according to the system needs or can be used as general domain question-and-answer pairs. At present, there has been a great development in automatic generation of English question sentence, which has been applied to question answering system, dialogue system and teaching system. The study of automatic generation of Chinese questions is just beginning, and many problems need to be solved by researchers. In order to solve the problem that the corpus of Chinese question answering system is not perfect, this paper proposes to supplement the corpus of question and answer system by generating Chinese question and answer pairs automatically. The contents of this study are as follows: 1. Automatic Generation system of Chinese question sentences At present, the automatic question generation system can not understand the meaning of a sentence as directly as a person. Therefore, the information preprocessing before question generation is necessary for every problem generation system. In this paper, the distributed design is adopted. The Chinese information extraction is divided into two parts and seven types of information, which are processed by different functional unit machines, and the final results are returned to the question generation system. In this paper, a question generation algorithm based on syntactic information and sentence information is designed to generate special questions or causality questions according to their information. 2. Automatic Classification of generated questions This paper presents an algorithm for matching named entities with partial templates, which will generate 6 kinds of question sentences. The six types of questions are named questions, toponymic questions, time expression questions, agency name questions, definition questions and causality questions. 3. Evaluation and improvement of system The English problem generation system defines a series of evaluation criteria. This subject will draw lessons from some of the standards to evaluate the system. At the same time, some users are invited to participate in the system testing, according to their feedback to improve and supplement the system.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前7条

1 夏天,樊孝忠,刘林,骆正华;基于ALICE的汉语自然语言接口[J];北京理工大学学报;2004年10期

2 费洪晓,康松林,朱小娟,谢文彪;基于词频统计的中文分词的研究[J];计算机工程与应用;2005年07期

3 胡宇舟;王雷;顾学道;;基于多模板隐马尔可夫模型的文本信息抽取算法[J];计算机应用;2008年03期

4 于海滨;秦兵;刘挺;郎君;;命名实体识别和指代消解在文摘系统中的应用[J];计算机应用研究;2006年04期

5 辛霄;范士喜;王轩;王晓龙;;基于最大熵的依存句法分析[J];中文信息学报;2009年02期

6 刘挺,吴岩,王开铸;基于信息抽取和文本生成的自动文摘系统设计[J];情报学报;1997年S1期

7 俞鸿魁;张华平;刘群;吕学强;施水才;;基于层叠隐马尔可夫模型的中文命名实体识别[J];通信学报;2006年02期



本文编号:1840929

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1840929.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户dda34***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com