当前位置:主页 > 科技论文 > 自动化论文 >

基于深度学习的感性词向量及情感分类方法的研究

发布时间:2018-06-30 02:13

  本文选题:情感分析 + 词向量 ; 参考:《西安理工大学》2017年硕士论文


【摘要】:情感分析是对给定文本或其中片段(如句子、短语或词语)的情感极性(正、负或极性)或情感强度(强或弱)的识别。情感分析可应用在产品评论分析中,可以识别用户对产品设计方面的情感,为商家和产品的设计师提供决策支持。先前的大部分研究采用人工抽取特征和传统机器学习算法相结合构建识别系统。然而,人工抽取特征需要各领域专家的相关知识,系统实用性较差,人力成本高。近年来研究者开始使用深度学习的方法来自动抽取特征,深度学习在自然语言处理中最基础的一个研究成果就是词向量,即词的分布式表示,并在许多自然语言处理中得到了应用。但是传统的词向量是根据上下文词语学习获得的,只包含语义和语法信息,而词语的情感信息对于情感分析任务至关重要,现有大多数基于词向量的学习方法只能对词语的语法语境建模,但忽略了词语的情感信息,故不能很好的解决情感分类的任务。针对这一问题,本文首先提出基于深度学习的感性词向量训练模型,使用两种简单的策略将文本中的情感信息与当前词的上下文词语进行了结合。为了验证学习到的情感词向量是否准确包含情感和上下文词语的语义信息,本文分别在不同的语言、不同领域的数据集下训练情感词向量,并在词语级别进行了定量实验。为了将词的感性语义表达应用到长文本中,本文基于半监督学习理论,将深度置信网络的自适应学习方法和主动学习方法相结合,能有效地解决半监督学习方法中长文本情感分类样本选择问题,同时用相同的深层网络结构来进行半监督主动学习,使深层网络结构在主动学习过程中进行多次迭代训练,逐渐提升抽象分类能力。当前,面对海量文本数据,为了提高情感分类中文本处理的效率,本文利用HDFS实现文本数据的分布式存储,并结合Spark分布式内存并行计算框架,实现了对文本预处理以及深度置信网络的并行优化。通过实验表明,基于分布式的深度置信网络能够大幅度缩短训练时间,加快运算速率。
[Abstract]:Emotional analysis is the recognition of the emotional polarity (positive, negative, or polar) or the emotional intensity (strong or weak) of a given text or its segments (such as sentences, phrases, or words). Emotional analysis can be applied to product review analysis, which can identify the user's emotion on product design and provide decision support for merchants and designers. Most of the previous studies use the combination of artificial feature extraction and traditional machine learning algorithm to construct the recognition system. However, artificial extraction of features requires the relevant knowledge of experts in various fields, the system is less practical and the labor cost is high. In recent years, researchers have begun to use the method of depth learning to extract features automatically. One of the most basic research results of depth learning in natural language processing is word vector, that is, the distributed representation of words. And has been applied in many natural language processing. But the traditional word vector is based on the contextual word learning, which contains only semantic and grammatical information, and the emotional information of words is very important to the task of emotional analysis. Most of the existing learning methods based on word vector can only model the grammatical context of words, but ignore the emotional information of words, so they can not solve the task of emotion classification. To solve this problem, this paper first proposes a perceptual word vector training model based on in-depth learning. Two simple strategies are used to combine the emotional information in the text with the contextual words of the current word. In order to verify whether the learned affective word vector accurately contains the semantic information of the emotional and contextual words, this paper trains the affective word vector under different language and domain data sets, and carries out quantitative experiments at the word level. In order to apply the perceptual semantic expression of words to the long text, this paper combines the adaptive learning method of depth confidence network with the active learning method based on semi-supervised learning theory. It can effectively solve the problem of sample selection of long text affective classification in semi-supervised learning method, and use the same deep network structure to carry out semi-supervised active learning, so that the deep network structure can be trained iteratively many times in the process of active learning. Gradually improve the ability of abstract classification. At present, in order to improve the efficiency of Chinese text processing of emotion classification, this paper uses HDFS to realize distributed storage of text data, and combines with Spark distributed memory parallel computing framework. The parallel optimization of text preprocessing and depth confidence network is realized. The experiments show that the distributed depth confidence network can greatly shorten the training time and speed up the operation.
【学位授予单位】:西安理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP18

【参考文献】

相关期刊论文 前7条

1 陈翠平;;基于深度信念网络的文本分类算法[J];计算机系统应用;2015年02期

2 史鹤欢;许悦雷;杨志军;李帅;李岳云;;基于深度置信网络的目标识别方法[J];计算机应用;2014年11期

3 余凯;贾磊;陈雨强;徐伟;;深度学习的昨天、今天和明天[J];计算机研究与发展;2013年09期

4 孙志军;薛磊;许阳明;王正;;深度学习研究综述[J];计算机应用研究;2012年08期

5 黄永文;何中市;伍星;;产品特征的层次关系获取[J];计算机工程与应用;2009年22期

6 米海涛;熊德意;刘群;;中文词法分析与句法分析融合策略研究[J];中文信息学报;2008年02期

7 刘群,张华平,俞鸿魁,程学旗;基于层叠隐马模型的汉语词法分析[J];计算机研究与发展;2004年08期

相关硕士学位论文 前4条

1 胡于响;基于Spark的推荐系统的设计与实现[D];浙江大学;2015年

2 唐振坤;基于Spark的机器学习平台设计与实现[D];厦门大学;2014年

3 毛子夏;基于感性工学产品造型设计的理论分析研究[D];南京航空航天大学;2007年

4 俞鸿魁;基于层次隐马尔可夫模型的汉语词法分析和命名实体识别技术[D];北京化工大学;2004年



本文编号:2084401

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2084401.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户dcd60***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com