当前位置:主页 > 科技论文 > 软件论文 >

多义词向量的优化研究

发布时间:2018-05-06 09:54

  本文选题:表示学习 + 多特征融合 ; 参考:《北京邮电大学》2016年硕士论文


【摘要】:随着神经网络算法和分布式并行计算技术的迅速发展,文本的表示问题又重新回归到人们的视野中。作为自然语言处理的基本问题,如何有效表征人类抽象复杂的语言一直是不可逃避的难题。近年来,随着互联网数据呈指数形式增长,令这一问题更加凸显。基于神经网络的词的表示学习旨在以词为最小单位来解决该问题,该类模型不仅充分利用大语料的信息,还通过各种优化手段降低训练时间复杂度,使人们能够方便地获得保留语义语法信息的表示向量,为自然语言处理的其他任务建立了良好的特征基础。词向量在信息检索,情感分析,机器翻译等任务取得了不错的成绩,但是仍然有提升空间。基于此背景,本文进行了一下工作:第一,本文研究了词的表示学习方法及优化策略,提出了多特性融合的词向量的优化方法,实现了先验的词性信息,位置权重因子,段落向量相融合,在词类比测试比原模型准确率提高了两个百分点。第二,本文还发现了词向量反义词区分能力上的不足,调研分析了区分因素,在同反义词集合上验证了模型的区分能力。第三,本文在Skip-gram模型的基础上提出和实现了在线学习多义词模型,对词学习多个语义对应的向量,并且再次融合多特性以进一步提升的多义词模型的效果,获得了与当前最优结果比肩的效果。
[Abstract]:With the rapid development of neural network algorithm and distributed parallel computing technology, the problem of text representation is returning to people's vision. As a basic problem of natural language processing, how to effectively represent human abstract and complex language has always been an unavoidable problem. In recent years, with the exponential growth of Internet data, this problem has become more prominent. The representation learning of words based on neural network aims to solve the problem by taking words as the smallest unit. This kind of model not only makes full use of the information of large corpus, but also reduces the complexity of training time by various optimization methods. It makes it easy to obtain the representation vector of preserving semantic grammar information, and establishes a good feature foundation for other tasks of natural language processing. Word vectors have achieved good results in information retrieval, affective analysis, machine translation and other tasks, but there is still room for improvement. Based on this background, this paper does some work: first, this paper studies the representation learning method and optimization strategy of words, proposes a multi-feature fusion word vector optimization method, realizes the prior part of speech information, position weight factor, Paragraph vector fusion improves the accuracy of part of speech test by two percentage points compared with the original model. Secondly, this paper also finds out the deficiency of lexical vector antonym distinguishing ability, investigates and analyzes the distinguishing factors, and verifies the model's distinguishing ability on the same antonym set. Thirdly, based on the Skip-gram model, this paper proposes and implements an online learning polysemous word model, learning multiple semantic corresponding vectors for words, and again fusing multiple features to further improve the effectiveness of the polysemous word model. The results are compared with the current optimal results.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;TP18

【参考文献】

相关期刊论文 前2条

1 黄玉洁;陶凤梅;刘娅;;极大似然估计及其应用[J];鞍山师范学院学报;2011年04期

2 邹小云;;极大似然估计的性质探讨[J];湖北职业技术学院学报;2007年02期

相关博士学位论文 前1条

1 胡珍珍;关于互联网视觉媒体若干问题的研究和应用[D];合肥工业大学;2014年

相关硕士学位论文 前1条

1 林坚;基于神经网络的模型参考自适应逆飞行控制[D];南京航空航天大学;2013年



本文编号:1851859

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1851859.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户3584a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com