当前位置:主页 > 科技论文 > 自动化论文 >

基于深度卷积神经网络的实体关系抽取

发布时间:2018-04-04 05:04

  本文选题:关系抽取 切入点:深度卷积神经网络 出处:《太原理工大学》2017年硕士论文


【摘要】:实体关系抽取一直以来就是自然语言处理领域研究的一个热点问题。能够准确的识别出两个实体之间的语义关系在信息抽取任务中是至关重要的,同时对于知识库的创建以及信息检索等领域都具有重要的意义。随着深度学习在图像和视觉等领域的迅猛发展,近年来深度学习也被引入到自然语言处理领域,成为了研究的热点。由于传统的实体关系抽取方法在模型学习之前都需要人工手动的选取一些离散的特征,特征选取的好坏直接关系到最终的抽取结果。我们无法预知什么样的特征最有效,而且特征的数量也不是越多越好,多数是依赖专家经验来判断特征的有效性。同时特征的选择过程大多依赖于现有的自然语言处理(NLP)工具,费时费力,且易造成错误传播。与传统的方法相比,基于深度学习机制的关系抽取算法可以自动的从原始的语料中学习到特征,不仅减少了对于NLP工具的依赖,而且充分利用了文本的结构信息。同时,前人的研究成果证明了深度学习模型中的卷积神经网络(Convolutional Neural Network,CNN)以其独特的网络结构可以对特征进行更好的学习。基于此,本文采用深度卷积神经网络完成实体关系抽取任务。首先,提出基于句子的衡量词重要性的TP-ISP(term proportion-inverse sentence proportion)算法,通过该算法得到每个类别中各个词的tpisp值,根据该值的大小结合排序算法得到关于每个词重要性的排序结果;然后选取排名靠前的词作为表征该类别的关键词特征,同原始句子的词向量特征和词位置特征一同作为网络的初始输入,减少了现有的使用深度学习的方法中仅仅依赖单一词向量学习特征的不足。通过加入该类别关键词特征,增加了类别间的区分度,同时也弥补了网络自动学习特征的不足;最后在网络训练阶段,本文采用分段最大池化策略,即选取每一段中得分值最高的特征,将这些特征组合起来作为最终分类器的输入特征。这一策略一定程度上减少了传统的最大池化策略对于信息的丢失问题。此外,由于中文语料匮乏等原因在此方面研究较少,因此本文以COAE(Chinese Opinion Analysis Evaluation)2016评测任务中的数据集为对象,将该模型结合中文语料的特殊性解决基于中文的实体关系抽取问题。同时使用word2vec工具中的Skip-gram模型和中文维基数据,训练获得了中文词向量表,优于单独使用word2vec随机初始化生成的词向量表。实验证明,本文的模型在英文和中文语料中都使得实体关系抽取结果得到很大的提升。
[Abstract]:Entity relation extraction has always been a hot topic in the field of natural language processing.It is very important to identify the semantic relationship between two entities accurately in the task of information extraction. It is also important for the creation of knowledge base and information retrieval and so on.With the rapid development of depth learning in the field of image and vision, deep learning has been introduced into the field of natural language processing in recent years.Because traditional entity relation extraction methods need to manually select some discrete features manually before model learning, the quality of feature selection is directly related to the final extraction results.We can not predict which features are the most effective, and the number of features is not as much as possible. Most of them depend on expert experience to judge the validity of features.At the same time, the process of feature selection mostly depends on the existing natural language processing tools, which is time-consuming and easy to cause error propagation.Compared with the traditional methods, the relationship extraction algorithm based on the deep learning mechanism can automatically learn features from the original corpus, which not only reduces the dependence on NLP tools, but also makes full use of the structural information of the text.At the same time, the previous research results prove that the convolutional Neural network CNNs in the deep learning model can better learn the features with their unique network structure.Based on this, this paper uses deep convolution neural network to complete the entity relation extraction task.First of all, the TP-ISP(term proportion-inverse sentence proportion algorithm based on sentence is proposed, through which the tpisp value of each word in each category is obtained, and the sorting result about the importance of each word is obtained according to the size of the value combined with the sorting algorithm.Then the top word is selected as the key word feature to represent the category, and the word vector feature and word position feature of the original sentence are used as the initial input of the network.It reduces the deficiency of the existing methods of using depth learning which only rely on single word vector learning features.By adding the keyword feature of the category, the classification degree among the categories is increased, and the deficiency of the automatic learning feature of the network is also made up. Finally, in the training stage of the network, this paper adopts the strategy of segment maximization pool.In other words, the features with the highest score in each segment are selected and combined as the input features of the final classifier.To some extent, this strategy reduces the problem of information loss caused by the traditional maximization strategy.In addition, due to the lack of Chinese corpus, this paper takes the data set in the COAE(Chinese Opinion Analysis Evaluation)2016 evaluation task as an object to solve the problem of entity relation extraction based on Chinese language combined with the particularity of Chinese corpus.At the same time, the Chinese word orientation scale is obtained by using the Skip-gram model and Chinese wiki data in word2vec tool, which is better than the word orientation scale which is generated by using word2vec random initialization alone.Experimental results show that the proposed model can greatly improve the result of entity relation extraction in both English and Chinese corpus.
【学位授予单位】:太原理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP183

【参考文献】

相关期刊论文 前8条

1 余涛;;基于稀疏自编码器的手写体数字识别[J];数字技术与应用;2017年01期

2 李江;冉君军;张克非;;一种基于降噪自编码器的人脸表情识别方法[J];计算机应用研究;2016年12期

3 邓俊锋;张晓龙;;基于自动编码器组合的深度学习优化方法[J];计算机应用;2016年03期

4 陈鹏;郭剑毅;余正涛;严馨;张志坤;高盛祥;;融合领域知识短语树核函数的中文领域实体关系抽取[J];南京大学学报(自然科学);2015年01期

5 刘绍毓;周杰;李弼程;席耀一;唐浩浩;;基于多分类SVM-KNN的实体关系抽取方法[J];数据采集与处理;2015年01期

6 贾真;何大可;杨燕;杨宇飞;冶忠林;;基于弱监督学习的中文网络百科关系抽取[J];智能系统学报;2015年01期

7 林古立;彭宏;马千里;韦佳;覃姜维;;一种基于关键词的网页搜索结果多样化方法[J];华南理工大学学报(自然科学版);2011年05期

8 庄成龙;钱龙华;周国栋;;基于树核函数的实体语义关系抽取方法研究[J];中文信息学报;2009年01期

相关博士学位论文 前2条

1 陈宇;基于深度置信网络的中文信息抽取方法[D];哈尔滨工业大学;2014年

2 毛存礼;有色金属领域实体检索关键技术研究[D];昆明理工大学;2014年

相关硕士学位论文 前8条

1 张冲;基于Attention-Based LSTM模型的文本分类技术的研究[D];南京大学;2016年

2 陈智;基于卷积神经网络的多标签场景分类[D];山东大学;2015年

3 王国昱;基于深度学习的中文命名实体识别研究[D];北京工业大学;2015年

4 胡新辰;基于LSTM的语义关系分类研究[D];哈尔滨工业大学;2015年

5 吴嘉伟;电子病历实体关系抽取研究[D];哈尔滨工业大学;2014年

6 许可;卷积神经网络在图像识别上的应用的研究[D];浙江大学;2012年

7 康琪;基于Bootstrapping的领域知识自动抽取技术的研究[D];山东大学;2012年

8 周蓝s,

本文编号:1708486


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1708486.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户00c6e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com