社交媒体文本情感分析

发布时间:2018-01-12 03:34

  本文关键词:社交媒体文本情感分析 出处:《南京理工大学》2017年硕士论文 论文类型:学位论文


  更多相关文章: 社交媒体 情感分类 语义规则 融合方法 情感词典 集成学习


【摘要】:近年来互联网技术依然保持着高速的发展状态,涌现了大量的互联网应用,包括社交网络应用。互联网上时刻产生着大量用户参与的人物、产品、事件等相关的社交媒体数据。情感分析技术用于挖掘文本中的主观情感信息,对微博为代表的社交媒体的情感分析可以挖掘其中潜在的商业与社会价值,在产品信息反馈、商品推荐算法、舆情监控、热点事件跟踪等方面有重要应用。本文主要研究面向社交媒体的情感分类问题,前两章对该问题的研究现状和基本技术进行了详细的介绍。然后,从不同的角度针对现有研究的不足之处,在第三至五章分别提出了本文的情感分类方法。(1)提出了一种机器学习与语义规则融合的情感分类方法。本文针对中文微博特点,在传统的基于词典分类方法上添加了多项语义规则,提高了对样本情感倾向度衡量的精准度。然后提出了特征嵌入式的融合方法,即将提取的词典规则特征转化扩展以后加入基本特征模板,该融合方式在情感分析粒度和特征表示两个方面优于一般的融合方法。实验证明该方法取得了较大的性能提升,在2015年的中文倾向性评测(COAE2015)的微博情感分类任务中,取得了限定资源模式下的第一名。(2)本文面向社交媒体数据,借助自然标注的方法帮助解决情感分类问题。在第4章,本文以神经网络模型词典构建方法为基础,通过加入语义规则和设置样本权重的方式对其进行了改进。在与人工标注词典和其他词典学习算法的比较中,该方法学习出的词典表现最优。使用该词典在2016年的中文倾向性评测(COAE2016)的情感词抽取任务中,取得了第一名的成绩。(3)本文提出在自然标注数据上进行集成学习提高分类性能。首先实验验证了Bagging集成模型相比于单一模型在稳定性和泛化能力上的优越性。在此基础上,提出Stacking集成学习模型,该模型通过对多个基分类器预测结果的二次学习,以及原有的词典特征,实现了自然标注数据和人工标注数据的全面结合。实验证明,该模型的分类性能高于仅加入词典特征的结合方式。
[Abstract]:In recent years, Internet technology still maintains a high-speed state of development, a large number of Internet applications have emerged, including social network applications. Event and other related social media data. Emotional analysis technology is used to mine the subjective emotional information in the text. The emotional analysis of social media represented by Weibo can tap the potential commercial and social value. It has important applications in product information feedback, product recommendation algorithm, public opinion monitoring, hot event tracking and so on. The first two chapters introduce the current situation and basic technology of this problem in detail. Then, from different angles, the shortcomings of the existing research are pointed out. In the third to fifth chapters, we put forward the emotion classification method of this paper respectively. (1) proposed a kind of emotion classification method which combines machine learning and semantic rules. This paper aims at the characteristics of Chinese micro-blog. Several semantic rules are added to the traditional dictionary-based classification method, which improves the accuracy of the measurement of sample emotional tendency. Then, a feature embedded fusion method is proposed. After the feature transformation of the extracted dictionary rules is extended, the basic feature template is added. The fusion method is superior to the general fusion method in terms of emotion analysis granularity and feature representation. In 2015, the Chinese tendentiousness Evaluation (COAE2015) of the Weibo emotional classification task, obtained a limited resource model under the first.) this paper is oriented to social media data. In Chapter 4, this paper is based on the neural network model dictionary construction method. It is improved by adding semantic rules and setting the weight of samples, which is compared with the learning algorithms of manual annotated dictionaries and other dictionaries. The best performance of the dictionary was obtained by this method. The dictionary was used in the affective word extraction task of COAE2016, a Chinese tendentiousness evaluation in 2016. First place score. In this paper, ensemble learning based on natural tagging data is proposed to improve classification performance. Firstly, the superiority of Bagging integration model compared with single model in stability and generalization ability is verified by experiments. . Stacking integrated learning model is proposed. The model is based on the quadratic learning of the prediction results of multiple base classifiers and the original dictionary features. The combination of natural tagged data and manual tagged data is realized. The experimental results show that the classification performance of the model is better than that of only adding dictionary features.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前6条

1 张志琳;宗成庆;;基于多样化特征的中文微博情感分类方法研究[J];中文信息学报;2015年04期

2 周红照;侯明午;颜彭莉;张叶青;侯敏;滕永林;;语义特征在评价对象抽取与极性判定中的作用[J];北京大学学报(自然科学版);2014年01期

3 庞磊;李寿山;周国栋;;基于情绪知识的中文微博情感分类方法[J];计算机工程;2012年13期

4 陈坚永;罗镇川;邓燕玲;张圭煜;;Phrase-Level Sentiment Polarity Classification Using Rule-Based Typed Dependencies and Additional Complex Phrases Consideration[J];Journal of Computer Science & Technology;2012年03期

5 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期

6 代六玲,黄河燕,陈肇雄;中文文本分类中特征抽取方法的比较研究[J];中文信息学报;2004年01期



本文编号:1412536

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1412536.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户01757***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com