当前位置:主页 > 科技论文 > 软件论文 >

中文代词消解关键技术研究

发布时间:2018-04-16 03:13

  本文选题:代词消解 + 词向量 ; 参考:《哈尔滨工业大学》2017年硕士论文


【摘要】:指代和省略是自然语言中广泛存在的语言现象,会造成语句的歧义问题,给自然语言理解带来了极大的困难,尤其是在聊天机器人等多轮对话的应用场景下。指代消解具有较长的研究历史,从早期的手工规则等理论方法研究到后来大规模语料中计算机自动处理技术的衍生,再到目前多种机器学习方法的引入,指代消解系统的性能在不断的提高。但由于对自然语言中语义的理解和表示方法仍然不够成熟,深层次的语言知识和语义特征的使用还较为简单,因此没有对词、句、篇章多层级的不同特点进行足够深入的挖掘,也没有对上下文信息进行有效的利用。本文旨在完善和提高多轮对话场景下的上下文理解,主要对中文代词消解及省略恢复任务中的关键技术进行研究,特别是在聊天机器人系统中的使用。主要内容包含如下几个方面:(1)本文提出了多特征融合的中文代词消解算法,引入了经验向量化特征、语义角色标注特征和词向量等多种类型的特征从多个角度来刻画表述对的语义、结构等多层次的特点。本文具体阐述了基于表述对模型的中文代词消解整体算法框架的构建与实现,在此基础上,探讨了多种类特征在该任务上的不同表现,提出并对比了几种特征融合方法的有效性,并在向量拼接方法的基础上验证了不同分类器参数、词向量维度、分类器阈值等对实验结果的影响,据此得到的最佳的实验结果。(2)本文将深度学习技术引入代词消解任务中。具体的,使用适宜序列化输入的长短时记忆网络模型学习表述对上下文的深层特征表示,分别将其应用于中文代词消解及省略恢复任务中。本文提出了一种基于双向循环网络的中文零代词识别算法,尝试归纳和总结了零代词识别任务中存在的问题,提出相应的规则优化方案。本文还研究了不同网络结构的深度学习模型在中文代词省略恢复任务中的表现,通过对比试验得到较优的模型和参数配置。(3)本文实现了基于微信平台得智能聊天机器人系统,详细介绍系统的总体结构、模块设计和系统展示,并对代词消解和省略恢复模块进行说明。在实践中探讨了中文代词消解技术和代词省略恢复技术在智能机器人系统中的有效性,并对语义补全任务做出了针对性分析和优化。
[Abstract]:Anaphora and ellipsis are a widespread linguistic phenomenon in natural languages, which can lead to ambiguity of sentences and bring great difficulties to the understanding of natural languages, especially in the context of multi-round dialogues such as chat robots.Anaphora resolution has a long history of research, from the early research on manual rules and other theoretical methods to the derivation of computer automatic processing technology in later large-scale corpus, and then to the introduction of various machine learning methods.The performance of the reference digestion system is constantly improving.However, the methods of understanding and expressing semantics in natural languages are still not mature enough, and the use of deep language knowledge and semantic features is relatively simple, so there is no deep enough mining of the different features of words, sentences and texts.There is also no effective use of context information.The purpose of this paper is to perfect and improve the context understanding in multi-round dialogues. The key technologies of Chinese pronoun resolution and ellipsis recovery are studied, especially in the chat robot system.The main contents are as follows: (1) in this paper, we propose a multi-feature fusion algorithm for Chinese pronoun resolution, and introduce the empirical vectorization feature.Many kinds of features such as semantic role tagging features and word vectors describe the semantic and structural features of expression pairs from many angles.In this paper, the construction and implementation of the whole algorithm framework of Chinese pronoun resolution based on representation model are described, and on the basis of this, the different performance of multi-type features in this task is discussed.The effectiveness of several feature fusion methods is proposed and compared, and the effects of different classifier parameters, word vector dimension and classifier threshold on the experimental results are verified on the basis of vector splicing method.Based on this, the best experimental results are obtained. (2) in this paper, the in-depth learning technique is introduced into the task of pronoun resolution.In detail, the long and short memory network model which is suitable for serialization input is used to learn the deep feature representation of context, which is applied to Chinese pronoun resolution and ellipsis recovery task respectively.This paper presents a Chinese zero-pronoun recognition algorithm based on bidirectional cyclic network, tries to sum up and summarizes the existing problems in the task of zero-pronoun recognition, and puts forward the corresponding rule optimization scheme.This paper also studies the performance of depth learning models with different network structures in the task of Chinese pronoun ellipsis recovery, and obtains a better model and parameter configuration by contrast experiments.) this paper implements an intelligent chat robot system based on WeChat platform.The general structure, module design and system display of the system are introduced in detail, and the module of pronoun resolution and ellipsis recovery is explained.In practice, the effectiveness of Chinese pronoun resolution and pronoun ellipsis recovery in intelligent robot system is discussed, and the semantic complement task is analyzed and optimized.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前8条

1 奚雪峰;周国栋;;基于Deep Learning的代词指代消解[J];北京大学学报(自然科学版);2014年01期

2 陈菜芳;;中文语义角色标注研究概述[J];文教资料;2012年27期

3 段Z诰,

本文编号:1757038


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1757038.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e4682***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com