面向中文微博文本的情感识别与分类技术研究
发布时间:2018-04-21 02:07
本文选题:微博内容分析 + 主观句识别 ; 参考:《华中师范大学》2014年硕士论文
【摘要】:作为时下最流行的社交媒体之一,微博具有信息传播快、信息量大、内容欠规范等显著特点,已发展成为互联网信息交流共享的重要平台之一。当前,对微博文本的情感识别与分类研究逐渐成为自然语言处理领域中一个新的热点研究方向及难点所在,其研究成果对于企业及时洞悉用户对产品或服务的使用反馈、对获取社会民众的民意、舆情监测等应用均具有重要的现实意义。 本研究致力于初步解决面向中文微博文本的主观句识别与情感分类问题,具体研究内容如下: 一、通过分析微博文本,总结出微博文本的若干结构特征,并构建表情情感库。在分析微博文本中常出现的重复标点符号的基础上,整理出辅助识别情感分类的标点符号情感库。将情感词汇本体库和表情情感库、标点符号情感库相结合,构建了中文微博文本的情感特征库。 二、分别使用词频统计、期望交叉熵、TF-IDF、以及求TF-IDF的方差等方法对微博文本进行情感特征抽取,其实验结果显示:基于方差与TF-IDF加权结合的特征识别与抽取方法取得了最好的结果。 三、关于微博文本的情感识别与分类,我们首先判断微博文本的主、客观性,使用朴素贝叶斯方法和支持向量机方法来识别主观句,实验结果显示:朴素贝叶斯方法对主观句的识别效果更好。此后,对属于主观句的微博文本我们进行了情感分类研究,使用基于支持向量机的一对一分类法和一对其余分类法,其实验结果显示:基于支持向量机的一对一分类法效果更好。 四、基于以上提出的情感特征抽取方法及情感识别与分类方法,我们构建了相应的原型系统。通过在公开评测数据集上的一系列实验验证了本文所提方法的可行性和有效性。
[Abstract]:As one of the most popular social media, micro-blog has become one of the most important platforms for the communication and sharing of Internet information, such as fast information transmission, large amount of information and lack of standard content. At present, the research of emotion recognition and classification of micro-blog text has gradually become a new hot research area in the field of Natural Language Processing. The research results are of great practical significance for the enterprise to understand the users' feedback on the use of products or services in time, and to obtain public opinion and public opinion monitoring.
This study aims to solve the problem of subjective sentence recognition and sentiment classification in Chinese micro-blog texts.
First, through the analysis of micro-blog text, it summarizes some structural features of micro-blog text, and constructs emotional expression library. On the basis of analyzing the repeated punctuation symbols which often appear in micro-blog text, it collate the emotional Library of punctuation symbols to identify the emotional classification, and combine the emotional vocabulary library with the expression emotion library and the punctuation symbol emotional library. The emotional feature library of Chinese micro-blog text is built.
Two, using the word frequency statistics, the expectation cross entropy, TF-IDF, and the variance of TF-IDF to extract the emotional feature of micro-blog text. The experimental results show that the best result is obtained by the method of feature recognition and extraction based on the combination of variance and TF-IDF weighting.
Three, on the emotion recognition and classification of micro-blog text, we first judge the subjective and objectivity of the micro-blog text, using the simple Bias method and the support vector machine method to identify the subjective sentence. The experimental results show that the simple Bias method has better recognition effect on the subjective sentence. After that, we have done the feeling to the micro-blog text which belongs to the subjective sentence. A one to one classification method based on support vector machines and a pair of other classification methods are used. The experimental results show that the one to one classification method based on support vector machines has a better effect.
Four, based on the above proposed method of emotional feature extraction and the method of emotion recognition and classification, we construct a corresponding prototype system. The feasibility and effectiveness of the proposed method are verified by a series of experiments on the public evaluation data set.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;TP393.092
【参考文献】
相关期刊论文 前4条
1 贺飞艳;何炎祥;刘楠;刘健博;彭敏;;面向微博短文本的细粒度情感特征抽取方法[J];北京大学学报(自然科学版);2014年01期
2 欧阳纯萍;阳小华;雷龙艳;徐强;余颖;刘志明;;多策略中文微博细粒度情绪分析研究[J];北京大学学报(自然科学版);2014年01期
3 胡燕;吴虎子;钟珞;;基于改进的kNN算法的中文网页自动分类方法研究[J];武汉大学学报(工学版);2007年04期
4 侯敏;滕永林;李雪燕;陈毓麒;郑双美;侯明午;周红照;;话题型微博语言特点及其情感分析策略研究[J];语言文字应用;2013年02期
相关博士学位论文 前4条
1 蒋良孝;朴素贝叶斯分类器及其改进算法研究[D];中国地质大学;2009年
2 施寒潇;细粒度情感分析研究[D];苏州大学;2013年
3 廖一星;文本分类及其特征降维研究[D];浙江大学;2012年
4 刘楠;面向微博短文本的情感分析研究[D];武汉大学;2013年
,本文编号:1780447
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1780447.html