基于神经网络的中文命名实体识别研究

发布时间：2018-05-29 01:01

本文选题：中文命名实体识别 + 深度学习　；参考：《南京师范大学》2017年硕士论文

【摘要】：命名实体识别任务是指从文本中识别出人名、地名和机构名等专有名词,是自然语言处理的关键技术之一,也是信息抽取、问答系统、机器翻译等应用的重要基础性工作。传统的基于统计学习模型的命名实体识别方法通常需要特征工程,特征对系统性能有较大影响,但是特征模板的设计需要大量人工参与和专家知识。为了减弱系统对人工特征设计的依赖,本文采用深度学习方法,结合中文命名实体识别任务的特点,研究基于神经网络的中文命名实体识别方法。本文的主要工作如下:(1)围绕命名实体识别任务和深度学习方法,讨论与分析了任务难点、常用的命名实体识别研究方法、深度学习方法基础、词向量以及常用的神经网络模型。(2)基于神经网络的字符标注方式实现了一个中文命名实体识别的基线(base1ine)系统。该方法采用双向长短期记忆模型,将中文命名实体识别任务看作一个序列标注问题,以中文句子中字符向量表示作为输入特征充分考虑上一下文信息,通过对中文序列中的每个字符分配标记完成命名实体识别任务。(3)对基于神经网络的片段级中文命名实体识别方法进行了探索性研究。由于中文句子中的单词间没有分隔符号,中文命名实体识别需要对给定的中文序列进行切分和实体分类。相比于对字符分配标记的方法,对切分片段整体分配标记更为合理,可以避免字符序列化标注方法中由局部标记区分实体边界的不足。本文首次提出了基于神经网络的片段级中文命名实体识别方法,采用两种基于神经网络的模型结构,将神经网络与半马尔可夫条件随机场模型相结合,通过对切分片段整体分配标记完成中文命名实体识别。本文对提出的中文命名实体识别方法进行了一系列实验,实验结果表明,基于神经网络的片段级中文命名实体识别方法相比于base1ine系统方法获得了显著的性能提升。
[Abstract]:The task of identifying named entities refers to the recognition of proper nouns such as personal names, place names and agency names from texts. It is one of the key technologies of natural language processing, and it is also an important basic work in the applications of information extraction, question answering system, machine translation and so on. Traditional named entity recognition methods based on statistical learning model usually require feature engineering, and feature has great influence on system performance, but the design of feature template requires a lot of manual participation and expert knowledge. In order to reduce the dependence of the system on artificial feature design, this paper studies the Chinese named entity recognition method based on neural network by using depth learning method and combining the characteristics of Chinese named entity recognition task. The main work of this paper is as follows: (1) focusing on the task and depth learning method of named entity recognition, this paper discusses and analyzes the task difficulties, common research methods of named entity recognition, and the foundation of depth learning method. Word vector and the commonly used neural network model. 2) A Chinese named entity recognition base-line system is implemented based on the character tagging method based on neural network. In this method, a bi-directional long-term and short-term memory model is adopted, and the task of identifying Chinese named entities is regarded as a sequence tagging problem. The character vector representation in a Chinese sentence is taken into account as an input feature. The task of named entity recognition is accomplished by assigning each character in Chinese sequence. (3) the method of segment level named entity recognition in Chinese based on neural network is studied in this paper. Since there are no separated symbols between words in Chinese sentences, Chinese named entity recognition needs to segment and classify the given Chinese sequences. Compared with the method of assigning tags to characters, it is more reasonable to assign tags to segmented fragments as a whole, which can avoid the deficiency of distinguishing entity boundaries by local markers in the method of character serialization. In this paper, for the first time, a Chinese named entity recognition method based on neural network is proposed. Two neural network-based models are used to combine the neural network with the semi-Markov conditional random field model. The Chinese named entity recognition is accomplished by the integral allocation tag of the segmented fragment. In this paper, a series of experiments are carried out on the proposed Chinese named entity recognition method. The experimental results show that the segmented Chinese named entity recognition method based on neural network has achieved a significant performance improvement compared with the base1ine system method.
【学位授予单位】：南京师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【参考文献】