当前位置:主页 > 管理论文 > 移动网络论文 >

面向中文微博的情感信息抽取方法研究

发布时间:2018-03-10 15:27

  本文选题:中文微博 切入点:情感信息抽取 出处:《北京信息科技大学》2015年硕士论文 论文类型:学位论文


【摘要】:随着互联网的广泛普及,网络已经成为人们获取信息、共享信息的主要途径。微博作为一种新兴的互动交流平台,也逐渐成为人们网络生活的一部分,面向微博文本的相关研究受到越来越多学者的关注。面向微博的情感分析是微博相关文本分析中的一个重要的课题,而中文微博的情感信息抽取作为中文微博情感分析的基础任务,受到研究者的广泛关注,逐渐成为一个热门的研究方向。 中文微博的情感信息抽取的目的在于将无结构的情感文本转换成有结构的文本——情感信息单元,不但可以直接应用于用户评论分析与决策等方面,而且可以服务于其它情感分析任务,如文本情感分类。其中情感信息单元包括评价对象、评价词语、极性及观点持有者四个元素。然而,由于微博文本语言表达随意,大多数微博文本的句法结构都是不完整的,且具有大量的冗余信息和网络词汇,采用原有文本意见挖掘方法进行抽取信息的效果并不理想。因此需要结合微博自身特点,对现有技术进行改进以便抽取微博情感信息,主要研究内容包括以下几个方面: (1)中文微博评价对象候选集的构建。结合中文微博文本的特点,对微博文本进行预处理,利用句法分析获取名词短语,对名词短语进行后处理,再构建包括名词、名词短语以及微博话题在内的评价对象候选集,并对该步骤的实验结果进行分析。 (2)中文微博候选评价对象的筛选。采用3种策略实现候选评价对象的筛选:首先,采用SVM模型筛选候选评价对象,通过采用语义角色信息、最小距离和词频三个特征,实现SVM模型分类器对候选评价对象进行筛选;其次,采用加权模型筛选候选评价对象,根据不同特征,计算候选评价对象的权重分数,从而判别其是否为正确的评价对象。最后,基于CRF模型善于解决序列标注问题的特点,引入常用的情感信息抽取特征,以及情感词、语义角色标注等特征,采用CRF模型对候选评价对象进行筛选。 (3)评价对象的极性判别。若评价对象附近存在情感词,则寻找距离评价对象最近的情感词,根据情感词表,判断评价对象的情感极性;若评价对象附近不存在情感词,则用微博句子的情感极性代替评价对象的情感极性,其中微博句子的情感极性通过朴素贝叶斯分类器得到。 (4)综上研究内容,设计并实现了中文微博情感信息抽取系统。该系统可用于对评价对象候选集的构建方法、候选评价对象的筛选方法以及极性判别方法进行实验结果分析,,也可实际用于情感信息的抽取任务。
[Abstract]:With the wide popularity of the Internet, the Internet has become the main way for people to obtain and share information. Weibo, as a new interactive communication platform, has gradually become a part of people's network life. The research on Weibo's text has attracted more and more scholars' attention. The affective analysis for Weibo is an important topic in the analysis of the relevant texts of Weibo. Chinese Weibo's emotional information extraction as the basic task of Chinese Weibo emotional analysis has been widely concerned by researchers and has gradually become a hot research direction. The purpose of Weibo's emotional information extraction is to transform the unstructured emotional text into a structured text-emotional information unit, which can be directly applied to the analysis and decision making of user comments and so on. And it can serve other affective analysis tasks, such as text affective classification. The emotional information unit includes four elements: evaluation object, appraising words, polarity and viewpoint holder. However, because Weibo's text language expresses freely, The syntactic structure of most Weibo texts is incomplete and has a lot of redundant information and network vocabulary. The effect of extracting information by using the original text opinion mining method is not ideal. To improve the existing technology to extract Weibo emotional information, the main content of the study includes the following aspects:. 1) Construction of candidate set for evaluating object of Chinese Weibo. According to the characteristics of Chinese Weibo text, this paper preprocesses the Weibo text, acquires noun phrases by syntactic analysis, post-processes noun phrases, and constructs nouns. Noun phrase and Weibo topic are evaluated candidate sets, and the experimental results of this step are analyzed. (2) the selection of candidate evaluation objects for Chinese Weibo. Three strategies are adopted to select candidate evaluation objects. Firstly, SVM model is used to screen candidate evaluation objects, and semantic role information, minimum distance and word frequency are used to select candidate evaluation objects. SVM model classifier is used to filter candidate evaluation objects. Secondly, weighted model is used to filter candidate evaluation objects. According to different characteristics, the weight fraction of candidate evaluation objects is calculated. Finally, based on the CRF model, which is good at solving the problem of sequence tagging, the commonly used features of emotional information extraction, affective words, semantic role tagging and so on are introduced. CRF model was used to screen candidate evaluation objects. If there are affective words near the evaluation object, then the nearest affective word is found, according to the emotional lexicon, the emotional polarity of the evaluated object is judged; if there is no affective word in the vicinity of the evaluation object, the emotional polarity of the evaluated object is judged according to the emotional lexicon. The emotion polarity of Weibo sentence is replaced by the emotion polarity of evaluation object, and the affective polarity of Weibo sentence is obtained by naive Bayes classifier. In this paper, a Chinese Weibo emotional information extraction system is designed and implemented, which can be used to analyze the experimental results of candidate set construction, candidate selection method and polarity discrimination method. It can also be used to extract emotional information.
【学位授予单位】:北京信息科技大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1;TP393.092

【参考文献】

相关期刊论文 前1条

1 樊娜;蔡皖东;赵煜;;基于最大熵模型的观点句主观关系提取[J];计算机工程;2010年02期

相关硕士学位论文 前2条

1 杜振雷;面向微博短文本的情感分析研究[D];北京信息科技大学;2013年

2 戴敏;中文评价对象抽取中省略现象研究[D];苏州大学;2014年



本文编号:1593937

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1593937.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户b1deb***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com