基于CRFs的微博评论情感分类的研究
发布时间:2019-04-11 16:46
【摘要】:信息社会信息传递的方式多种多样,通过微博这种便捷的信息交流方式,信息的传递已经深入我们生活各个角落。由于在微博平台上拥有数以万计的用户,而且经常会在微博上发表对于某件事情或者某一热点话题的讨论带有个人感情色彩的见解。因此,对微博平台上留存的大量的语料进行分析,可发现大多数人群普遍的情绪、情感和价值取向,可为关心相关问题的决策者提供分析问题的依据。 本文首先对已有的语料的情感分析的相关的研究进行了归纳与总结。随后,比较了几种常用的情感分类模型,包括基于相似度的方法、贝叶斯分类器、支持向量机等。通过对各个模型的优、缺点进行分析,最终,采用目前广泛认可的一种情感分类方法——条件随即场(CRFs);其次,采用词语粒度级别上对文本中的中文句子进行特征性的标注,利用条件随即场模型对实验的语料进行训练,形成训练模型,,运用训练好的模型对评论信息进行情感倾向性的判定。最后,提出一种情感强弱的分级机制,使得情感分析不仅仅局限于正面、中性以及反面三种情况,实验结果量化了原有的三个方面,从而通过量化后的结果,对情感的强弱进行排名。 本文通过使用CRFs对语料的分析后得出的结果来看,CRFs对于情感语句具有较好的分类效果,而且运用实验结果基本上验证了作者提出的情感强弱分级的机制的可行性,通过量化的结果可为决策者提供数据的支撑。但研究中仍有需要改进的地方,如语料库仍不十分完备等问题,日后会进一步完善。
[Abstract]:There are many ways of information transmission in information society. Weibo is a convenient way to communicate information, and the transmission of information has gone deep into every corner of our life. With tens of thousands of users on the Weibo platform, and often on Weibo, personal insights into the discussion of something or a hot topic are posted. Therefore, the analysis of a large number of corpus retained on the Weibo platform can find that the general emotion, emotion and value orientation of most people can provide the basis for the decision makers concerned about the related problems to analyze the problems. First of all, this paper summarizes the related research of emotional analysis of the existing corpus. Then, several commonly used affective classification models are compared, including similarity-based methods, Bayesian classifiers, support vector machines, and so on. Based on the analysis of the advantages and disadvantages of each model, finally, a widely accepted emotion classification method, conditional Random Field (CRFs);, is adopted at the end of the paper. Secondly, the Chinese sentences in the text are marked at the level of word granularity, and the experimental corpus is trained by the conditional random field model to form a training model. The trained model is used to judge the emotional tendency of the comment information. Finally, a classification mechanism of emotion strength is proposed, which makes emotional analysis not only confined to positive, neutral and negative cases, but also quantifies the original three aspects of the experimental results, so as to pass the quantized results. Rank the strength of the emotion. Through the analysis of the corpus by using CRFs, this paper shows that CRFs has a good classification effect for affective sentences, and the experimental results basically verify the feasibility of the mechanism of emotional intensity classification proposed by the author. Quantitative results can provide data support for decision makers. However, there are still some problems that need to be improved, such as the incomplete corpus and so on, which will be further improved in the future.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2456587
[Abstract]:There are many ways of information transmission in information society. Weibo is a convenient way to communicate information, and the transmission of information has gone deep into every corner of our life. With tens of thousands of users on the Weibo platform, and often on Weibo, personal insights into the discussion of something or a hot topic are posted. Therefore, the analysis of a large number of corpus retained on the Weibo platform can find that the general emotion, emotion and value orientation of most people can provide the basis for the decision makers concerned about the related problems to analyze the problems. First of all, this paper summarizes the related research of emotional analysis of the existing corpus. Then, several commonly used affective classification models are compared, including similarity-based methods, Bayesian classifiers, support vector machines, and so on. Based on the analysis of the advantages and disadvantages of each model, finally, a widely accepted emotion classification method, conditional Random Field (CRFs);, is adopted at the end of the paper. Secondly, the Chinese sentences in the text are marked at the level of word granularity, and the experimental corpus is trained by the conditional random field model to form a training model. The trained model is used to judge the emotional tendency of the comment information. Finally, a classification mechanism of emotion strength is proposed, which makes emotional analysis not only confined to positive, neutral and negative cases, but also quantifies the original three aspects of the experimental results, so as to pass the quantized results. Rank the strength of the emotion. Through the analysis of the corpus by using CRFs, this paper shows that CRFs has a good classification effect for affective sentences, and the experimental results basically verify the feasibility of the mechanism of emotional intensity classification proposed by the author. Quantitative results can provide data support for decision makers. However, there are still some problems that need to be improved, such as the incomplete corpus and so on, which will be further improved in the future.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前10条
1 郭雯,葛朝阳,吴晓波;基于客户认知价值的CRM战略[J];商业研究;2003年08期
2 冯奇峰,李言;客户的认知投入与保持投入模型研究[J];计算机集成制造系统;2005年09期
3 何凤英;;基于语义理解的中文博文倾向性分析[J];计算机应用;2011年08期
4 张玉芳;莫凌琳;熊忠阳;耿晓斐;;基于条件随机场的科研论文信息分层抽取[J];计算机应用研究;2009年10期
5 唐慧丰;谭松波;程学旗;;基于监督学习的中文情感分类技术比较研究[J];中文信息学报;2007年06期
6 徐军;丁宇新;王晓龙;;使用机器学习方法进行新闻的情感自动分类[J];中文信息学报;2007年06期
7 刘康;赵军;;基于层叠CRFs模型的句子褒贬度分析研究[J];中文信息学报;2008年01期
8 金昌虎;;在线WOM内容和效果的关系:产品知识和相关的影响[J];沈阳大学学报;2006年01期
9 郭国庆;杨学成;张杨;;口碑传播对消费者态度的影响:一个理论模型[J];管理评论;2007年03期
10 孙艳;周学广;付伟;;无监督的主题情感混合模型研究[J];西安交通大学学报;2013年01期
本文编号:2456587
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2456587.html