交互式问答系统中的待改进问题自动识别方法

发布时间：2018-04-09 05:26

本文选题：问答系统　切入点：知识库扩充　出处：《哈尔滨工业大学》2013年硕士论文

【摘要】：随着Internet的不断发展，人们已经不满足于仅仅利用搜索引擎搜索需要的信息。如何快速方便的为用户提供需要的信息成为人们努力研究的焦点。自动问答系统刚好具有既能满足用户对信息的需求，也能满足获取人性化回复这两方面的特点，因此能够很好的解决这一问题。但是传统的问答系统没有对已经存在的那些回复答案不理想的问题自动识别的机制，这对问答系统进行改进或知识库更新都是一个挑战。为了弥补传统问答系统缺乏对回复不好的问题进行识别的缺点，本课题对交互式问答系统中存在的待改进问题的自动识别方法进行研究，本课题提出了一种交互式问答系统中的待改进问题自动识别方法，对基于用户情感、意图和混合特征的待改进问题识别效果进行分析，，将需要通过人工审核方式识别待改进问题的工作转换为使用自动识别方法对其进行识别，省去了人工审核的工作，提高识别效率。为了更好地识别系统中的待改进问题，本课题设计了一种面向混合特征的知识库扩充方法，采用网络爬虫工具，将知识库语料扩充为39161条，这些设计多领域多方面的问答语料基本满足了用户的会话需求。在此研究基础上改进了问答系统架构和运行平台的可移植性，现在的比特机器人问答系统能够运行于微信、QQ和网页三种平台。这种多平台的运行模式为问答系统吸引大量使用用户。识别出这些待改进问题后，将通过人工审核的方式获取正确答案，最后将这些改进后的问题和改进后的答案更新至系统知识库，从而实现问答系统知识库的更新。本课题实验过程的数据来源是问答系统微信平台获取的真实问答语料，共计3119条问答对。通过对这些真实会话语料的标注和分析，确定待改进问题的识别方法。最终对问答系统中待改进问题的识别准确率达到76.77%。最后的实验结果和系统实际运行效果证明了本课题提出的问答系统中待改进问题的自动识别方法的可行性。
[Abstract]:With the development of Internet, people are not satisfied with the information that search engine needs.How to provide information for users quickly and conveniently has become the focus of research.The automatic Q & A system has the characteristics of not only meeting the information needs of users, but also meeting the two characteristics of obtaining humanized reply, so it can solve this problem very well.But the traditional question answering system does not have the mechanism to automatically identify the questions which are not well answered, which is a challenge to the improvement of the question answering system or the updating of the knowledge base.In order to make up for the shortcoming of the traditional question answering system, this paper studies the automatic recognition method of the problem in the interactive question answering system.In this paper, an improved problem recognition method in interactive question answering system is proposed, and the effect of problem recognition based on user emotion, intention and mixed features is analyzed.The work needed to identify the problems to be improved by means of manual auditing is transformed into the identification of the problems by automatic identification, which saves the work of manual auditing and improves the efficiency of identification.In order to better identify the problems to be improved in the system, a hybrid feature oriented knowledge base expansion method is designed in this paper. The knowledge base corpus is expanded to 39161 by using the web crawler tool.These design multi-domain and multi-faceted question and answer corpus basically satisfy the user's conversation demand.On the basis of this research, the architecture of Q & A system and the portability of running platform are improved. Now, the quizzing system of bit robot can run on three kinds of platforms: WeChat QQ and web page.This multi-platform mode of operation attracts a large number of users for the Q & A system.After identifying these questions to be improved, the correct answers will be obtained by manual examination. Finally, the improved questions and the improved answers will be updated to the system knowledge base, thus the updating of the question answering system knowledge base will be realized.The data source of the experiment process is the real question and answer corpus obtained by the Question-answering system WeChat platform, with a total of 3119 question-and-answer pairs.Through the annotation and analysis of these real conversational data, the identification method of the problem to be improved is determined.Finally, the accuracy of problem recognition in question answering system is 76.77.Finally, the experimental results and the actual operation results of the system prove the feasibility of the automatic identification method of the problem to be improved in the question and answer system proposed in this paper.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP393.09

【参考文献】