基于词典的中文微博情绪分析
发布时间:2018-06-25 10:37
本文选题:微博 + 情绪分析 ; 参考:《南京航空航天大学》2014年硕士论文
【摘要】:近年来,微博受到越来越多的关注和喜爱,成为人们表达个人情绪和感受的重要平台。因此,微博已经成为意见挖掘和情感分析的重要资源,吸引了大量专家学者的关注和研究。针对微博进行情绪分析可以迅速了解大众情绪走向并且对于个人情绪调节有着重要的意义。本文通过对微博的研究分析提出了基于词典的规则方法识别微博所表达的喜、哀、怒、惧、恶、惊六种情绪。 首先,提出以词典为依据的基于规则的方法,通过实验详细分析了中文情绪词典在微博情绪分析中的现状,讨论了存在的主要问题并深入讨论了微博中情绪表达的语言特点。基于此,构建了两个重要的微博情绪分析词典:微博表情符词典EmoDic和中文情绪词典SixDic。其中,,微博表情符词典EmoDic主要利用互信息方法构建,而中文情绪词典SixDic则是在文本的词性分析基础上,将互信息方法与情绪标注信息混合筛选的方式获取。 其次,通过对词典以及微博表达的分析制定了详细的规则,利用本文构建的两个词典进行六类情绪识别实验。实验表明,中文情绪词典SixDic微博情绪分析结果的覆盖率达到65.8%,正确率达到64%,比同等方法下的大连理工情感本体库DUTIR高出12%左右。而表情符词典EmoDic结果比人工挑选表情符有更高的召回率,与中文词典SixDic并用之后,提高情绪分析覆盖率至80.4%,系统通过对表情符加权和使用否定规则达到最佳性能,正确率为74.1%。 最后,选取了一元词、中文情绪词典、表情符词典、否定词以及标点符号为特征,采用支持向量机SVM进行有监督的情绪分类实验,结果表明词典特征在情绪识别种的效果优于一元词。将SixDic、EmoDic、否定词和标点符号共用作为特征时SVM情绪分类结果最好,达到61.7%的正确率。实验结果表明,在微博细致情绪识别中,基于词典的规则方法具有明显的优越性。
[Abstract]:In recent years, Weibo has attracted more and more attention and become an important platform for people to express their emotions and feelings. Therefore, Weibo has become an important resource of opinion mining and emotion analysis, and has attracted the attention and research of a large number of experts and scholars. Emotion analysis based on Weibo can quickly understand the trend of public emotion and play an important role in personal emotion regulation. In this paper, based on the analysis of Weibo, a dictionary-based rule method is proposed to identify the six emotions expressed by Weibo: joy, sadness, anger, fear, evil and fear. Firstly, a rule-based approach based on dictionaries is proposed. The present situation of Chinese emotion dictionary in Weibo emotional analysis is analyzed in detail through experiments. The main problems are discussed and the linguistic characteristics of emotion expression in Weibo are discussed in depth. Based on this, two important Weibo emotion analysis dictionaries are constructed: Weibo emoticons dictionary and Chinese emotion dictionary six dictionaries. The Weibo emoticons dictionary is constructed mainly by mutual information method, while the Chinese emotion dictionary SixDic is obtained by mixing mutual information method with emotional tagging information on the basis of part of speech analysis. Secondly, through the analysis of dictionaries and Weibo expressions, the detailed rules are made, and six kinds of emotion recognition experiments are carried out by using the two dictionaries constructed in this paper. The experimental results show that the Chinese emotion Dictionary SixDic Weibo has a coverage rate of 65.8 and a correct rate of 64, which is about 12% higher than that of DUTIR, an emotional ontology library of Dalian University of Science and Technology under the same method. The EmoDic result of emoticons dictionary has a higher recall rate than that of manual selection of emoticons. After being used with six Chinese dictionaries, the emotional analysis coverage is increased to 80.40.The system achieves the best performance by weighting emoji and using negative rules, and the correct rate is 74.1%. Finally, a supervised emotion classification experiment is carried out with support vector machine (SVM), which is based on monologues, Chinese emotion dictionaries, emoji dictionaries, negative words and punctuation marks. The results show that the effect of dictionary features in emotion recognition is better than that of monomorphic words. When six Dictionary EmoDic, negative words and punctuation marks are used as features, SVM has the best result of emotion classification, and the accuracy is 61.7%. The experimental results show that the dictionary-based rule method has obvious advantages in Weibo detailed emotion recognition.
【学位授予单位】:南京航空航天大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前2条
1 欧阳纯萍;阳小华;雷龙艳;徐强;余颖;刘志明;;多策略中文微博细粒度情绪分析研究[J];北京大学学报(自然科学版);2014年01期
2 杨亮;林原;林鸿飞;;基于情感分布的微博热点事件发现[J];中文信息学报;2012年01期
本文编号:2065624
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2065624.html