交互式问答中的语句关系识别方法

发布时间：2018-03-23 06:05

本文选题：问答匹配关系　切入点：补充关系　出处：《哈尔滨工业大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网技术的发展和信息量的迅速增长,人们迫切需要一种准确、高效的信息获取方式。从搜索引擎到智能交互式问答系统,信息的获取方式越来越接近于自然交互。一方面因为海量数据的出现,另一方面因为机器学习和自然语言处理等技术的长足进步,问答系统进入了面向各领域、基于自由文本和异构信息、基于生成式的智能交互式问答发展阶段。与搜索引擎不同的是,用户无需在多条候选文档中选择,问答系统能更好的理解以自然语言形式描述的问题,同时返回简洁精确的答案。随着Siri和Watson的成功问世,智能交互式问答系统成为了近年来的一个研究热点,在商业领域也越来越具有代替人工客服的潜力。然而,要构建更加智能的交互式问答系统,从已有的客服日志中学习知识就显得非常重要,而如何从复杂的交互式问答客服日志中识别问句与答句之间的匹配关系以及连续语句之间的补充关系则成为了构建学习系统的关键。本文主要针对交互式问答中的语句匹配关系识别和补充关系识别进行了研究。针对客户问句与客服回答之间的匹配问题,本文分别构建了基于CNN的语义匹配模型和基于RNN的生成模型,模型的输入层是句子的词向量矩阵,输出层是问答匹配的置信度。分别在Semeval-2016社区问答数据和在线客服对话数据上,进行了不同模型的性能对比。同时对问句的完整性、生成模型的不同结构、阈值选择以及客服数据的抽取方式等进行了对比实验分析。实验结果表明,在社区问答数据中,本文中基于CNN的匹配模型优于RNN生成模型;在客服对话数据中,基于RNN的序列学习模型能够更好的学习到场景对话中的上下文信息。在基于每轮对话且问句完整的数据上,MAP达到了84.41%。针对交互式问答中连续语句之间存在的上下文相关联的潜在语义补充关系,本文研究了句子补充关系的识别。在深度模型上,构建了并行CNN和串联LSTM对句子对进行抽象语义特征提取和建模。分别采用支持向量机、基于CNN的模型和基于RNN的模型,对句子对的补充关系进行分类。实验结果表明,基于CNN的识别方法优于其他对比方法,其F1值达到了67.8%。最终,将补充关系识别和匹配关系识别相结合应用于交互式问答语义匹配。
[Abstract]:With the development of Internet technology and the rapid growth of information, people urgently need an accurate and efficient way to obtain information, from search engine to intelligent interactive question answering system. On the one hand, due to the emergence of massive data, on the other hand, due to the rapid progress of machine learning and natural language processing, the question answering system has entered various fields. Based on free text and heterogeneous information, intelligent interactive question-and-answer based on generative stage. Unlike search engines, users do not have to choose from multiple candidate documents. With the success of Siri and Watson, the intelligent interactive question answering system has become a research hotspot in recent years. There is also a growing potential in the business world to replace manual customer service. However, it is important to learn from existing customer service logs in order to build a more intelligent interactive question-and-answer system. However, how to identify the matching relationship between question and answer sentences and the complementary relationship between continuous sentences from the complex interactive Q & A log becomes the key to construct a learning system. To solve the matching problem between customer question and customer service, In this paper, the semantic matching model based on CNN and the generating model based on RNN are constructed respectively. The input layer of the model is the word vector matrix of sentence, the confidence of question and answer matching is at the output level, respectively on Semeval-2016 community question and answer data and online customer service conversation data. At the same time, the integrity of question sentence, the different structure of generating model, the selection of threshold value and the way of extracting customer service data are compared and analyzed. The experimental results show that, in the community question and answer data, In this paper, the matching model based on CNN is superior to the RNN generation model. The sequence learning model based on RNN can better learn the context information in the scene dialogue. The map reached 84.41 on the data based on each round of dialogue and question sentence integrity. The underlying semantic complementary relationship associated with the text, In this paper, the recognition of sentence complement relationship is studied. In depth model, parallel CNN and tandem LSTM are constructed to extract and model the abstract semantic features of sentence pairs. Support vector machine (SVM), CNN based model and RNN based model are used, respectively. The experimental results show that the recognition method based on CNN is superior to other comparison methods, and its F1 value reaches 67.8%. This paper applies complementary relationship recognition and matching relationship recognition to interactive question and answer semantic matching.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【参考文献】