当前位置:主页 > 科技论文 > 软件论文 >

基于深度学习的微博评论情感倾向性分析

发布时间:2018-08-30 09:13
【摘要】:随着移动互联网的迅猛发展,网民参与社会热点讨论的热情不断高涨,新浪微博成为网民发表观点抒发情感的重要平台,基于新浪微博的社交网络很大程度上反应了中国人的社交行为和情感倾向。如何快速挖掘出新浪微博中隐藏的情感信息,为政府和企业的决策提供有效的辅助信息,正成为自然语言处理领域的研究热点。传统的情感分析需要花费大量的时间提取数据中的特征,并且往往需要与语法规则相结合才能取得比较好的结果,但是在大数据时代,数据量越来越大,人工提取特征的难度不断加大。本文提出使用词向量加深度学习组合的方式去学习数据中的情感信息,其中,使用无监督的Word2vec和Glove模型将数据训练成词向量,词向量将取代人工提取的特征,这种方法节省了人力,并且使用深度学习模型自动学习词向量中的情感信息,最后,通过对比实验验证深度学习模型能够在语句级情感分析任务中取得较好的效果。本文通过Word2vec和Glove语言模型将微博评论数据训练生成两种词向量并分别输入到浅层学习模型SVM、Logistic Regression、Naive Bayesian和深度学习模型LSTM、CNN、LSTM+CNN中,浅层学习模型和深度学习模型通过学习得到词向量中隐藏的情感信息并给出情感分类的结果,根据实验结果统计模型的准确率、召回率等模型性能评估指标,其中,浅层学习模型最高的准确率接近78.1%,深度学习模型最高的准确率接近84.5%。通过对比实验结果本文发现,与浅层学习模型相比,深度学习模型中的LSTM能够存储远距离的信息,CNN能够提取不同维度的特征,这些功能能够更好地挖掘出词向量中隐藏的情感信息,而浅层学习模型在挖掘词向量中隐藏的情感信息时损失了词与词之间的语义信息,这是浅层学习模型性能下降的一个主要原因。与Word2vec词向量相比,Glove词向量能够利用全局统计信息,将更多的情感信息存储到词向量中,而Word2vec只能利用局部信息,因此Glove词向量情感分类的效果要好于Word2vec词向量。
[Abstract]:With the rapid development of the mobile Internet, netizens' enthusiasm for participating in hot social discussions has been rising. Weibo of Sina has become an important platform for netizens to express their views and express their feelings. The social network based on Sina Weibo largely reflects the social behavior and emotional tendency of Chinese people. How to quickly dig out the hidden emotional information in Sina Weibo and provide effective auxiliary information for government and enterprise decision-making is becoming the research hotspot in the field of natural language processing. Traditional affective analysis requires a lot of time to extract features from the data, and it often needs to be combined with grammar rules to get better results. But in big data's time, the amount of data is getting larger and larger. The difficulty of artificial feature extraction is increasing. In this paper, we propose to use word vector and depth learning combination to learn emotional information in data, in which unsupervised Word2vec and Glove models are used to train data into word vectors, and word vectors will replace the features extracted manually. This method saves manpower and uses the depth learning model to automatically learn the emotion information in the word vector. Finally, the comparison experiment shows that the depth learning model can achieve good results in the task of sentence level emotion analysis. In this paper, two kinds of word vectors are generated by training Weibo's comment data through Word2vec and Glove language models and input into shallow learning model (SVM,Logistic Regression,Naive Bayesian) and deep learning model (LSTM,CNN,LSTM CNN), respectively. The shallow learning model and the deep learning model obtain the hidden emotion information in the word vector and give the result of emotion classification. According to the accuracy of the statistical model and recall rate of the experimental results, the performance of the model is evaluated. The highest accuracy of shallow learning model is close to 78.1 percent, and that of depth learning model is close to 84.5 percent. By comparing the experimental results, it is found that compared with the shallow learning model, the LSTM in the deep learning model can store remote information and extract the features of different dimensions. These functions can better mine the hidden emotional information in the word vector, while the shallow learning model loses the semantic information between the word and the word when mining the hidden emotion information in the word vector. This is one of the main reasons for the performance degradation of the shallow learning model. Compared with Word2vec word vector, Glove word vector can use global statistical information and store more emotional information into word vector, while Word2vec can only use local information, so the effect of Glove word vector classification is better than Word2vec word vector.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP393.092

【参考文献】

相关期刊论文 前7条

1 李华;屈丹;张文林;王炳锡;梁玉龙;;结合全局词向量特征的循环神经网络语言模型[J];信号处理;2016年06期

2 陈强;何炎祥;刘续乐;孙松涛;彭敏;李飞;;基于句法分析的跨语言情感分析[J];北京大学学报(自然科学版);2014年01期

3 王振宇;吴泽衡;胡方涛;;基于HowNet和PMI的词语情感极性计算[J];计算机工程;2012年15期

4 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期

5 熊德兰;程菊明;田胜利;;基于HowNet的句子褒贬倾向性研究[J];计算机工程与应用;2008年22期

6 唐慧丰;谭松波;程学旗;;基于监督学习的中文情感分类技术比较研究[J];中文信息学报;2007年06期

7 张学工;关于统计学习理论与支持向量机[J];自动化学报;2000年01期



本文编号:2212639

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2212639.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1f853***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com