当前位置:主页 > 科技论文 > 软件论文 >

基于深度学习的中文命名实体识别研究

发布时间:2018-04-27 07:39

  本文选题:中文命名实体识别 + 深度学习 ; 参考:《北京工业大学》2015年硕士论文


【摘要】:中文命名实体识别是自然语言处理领域中的基本任务之一,也是自动问答、信息抽取等自然语言处理综合应用中的基础环节。近十几年来,国内外学者对文本中的实体识别技术已有广泛探讨和深入研究。但随着互联网的飞速发展,大量无规则、多领域的文本数据不断增长,对命名实体识别技术提出了新的要求。本文主要工作如下:(1)对国内外命名实体识别的解决方法进行了调查研究,分析了当今主流模型方法与技术发展趋势。在总结当前主流方法的缺陷和中文命名实体识别的特殊性的同时,指出了利用深度学习的相关理论来解决中文命名实体识别问题的新思路。(2)提出了一种基于堆叠式自编码分类器的深层神经网络模型,对该模型在命名实体识别任务中的应用进行了深入研究。解决了从中文文本序列到模型输入向量的转化问题,推导了便于工程实现的向量化前向-后向传播公式。同时,总结了一套行之有效的参数初始化与调参方法,优化了模型训练过程与实体标注效果。(3)在建立模型的基础上,进行了大量的对比实验。实验结果表明,这种深层神经网络标注模型具有良好的中文实体识别效果,在人民日报语料集上的测试效果达到了当前最好水平。特别在地名、机构名的识别方面比条件随机场模型更具优势,地名与机构名的识别召回率比条件随机场的识别结果分别提升了9.60%、8.84%,F值分别提升了3.76%、2.35%。(4)实现了基于深层神经网络模型的中文命名实体识别系统。提出了增量学习的半自动化处理流程:系统结合边界熵与增量训练的半监督后处理方法,用以替代过去规则与统计结合的传统框架。解决了实践中中文标注语料匮乏、训练开销与维护成本较大的问题,使其能够在少量人工干预的前提下,快速有效地处理海量中文数据。实践表明,基于深度学习理论的神经网络模型能够很好的应用于中文命名实体识别任务。以该模型为核心建立的中文命名实体识别系统具有良好的健壮性和可维护性,能够满足大数据背景下中文命名实体识别的新需求。
[Abstract]:Chinese named entity recognition is the basic part of Ren Wuzhi in the field of natural language processing, and it is also the basic link in the comprehensive application of natural language processing such as automatic question answering, information extraction and so on. In the past ten years, scholars at home and abroad have extensively discussed and studied the technology of entity recognition in text. However, with the rapid development of the Internet, a large number of irregular, multi-field text data is growing, and a new requirement for named entity recognition technology is put forward. The main work of this paper is as follows: (1) this paper investigates and studies the methods of identifying named entities at home and abroad, and analyzes the trend of development of current mainstream model methods and technologies. While summarizing the defects of current mainstream methods and the particularity of Chinese named entity recognition, This paper points out a new way to solve the problem of Chinese named entity recognition by using the theory of depth learning. (2) A deep neural network model based on stacked self-coding classifier is proposed. The application of this model in the task of named entity recognition is deeply studied. The transformation problem from Chinese text sequence to model input vector is solved, and the vectorization forward-backward propagation formula is derived. At the same time, a set of effective parameter initialization and parameter adjustment methods are summarized, and the model training process and the effect of entity tagging are optimized. On the basis of establishing the model, a large number of comparative experiments are carried out. The experimental results show that this deep neural network annotation model has a good effect on Chinese entity recognition, and the test results on People's Daily corpus reach the best level at present. Especially in geographical names, the recognition of agency names is more advantageous than conditional random field models, The recognition recall ratio of place name to agency name the recognition result of the field increased 9.600.84 F value increased 3.76 / 2.35 / 4 respectively) the Chinese named entity recognition system based on the deep neural network model was implemented. A semi-supervised post-processing method combining boundary entropy and incremental training is proposed to replace the traditional framework of combining rules with statistics. It solves the problems of lack of Chinese tagging corpus, high cost of training and maintenance in practice, and enables it to deal with massive Chinese data quickly and effectively under the premise of a small amount of manual intervention. Practice shows that the neural network model based on depth learning theory can be well applied to Chinese named entity recognition task. The Chinese named entity recognition system based on this model has good robustness and maintainability and can meet the new requirement of Chinese named entity recognition under big data background.
【学位授予单位】:北京工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1


本文编号:1809857

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1809857.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9d21e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com