面向微博短文本的情感新词发现与倾向性研究

发布时间：2019-04-29 12:51

【摘要】：在社交网络风靡全球的时代,涌现出许多新的词语甚至新的表情符号,他们往往伴随着社会热点新闻一起出现,就像是网络舆论的风向标。如何有效地从海量微博信息中提取网络新词并进行情感分析,对于微博内容的话题追踪、舆情分析等起到重要作用。这些新词包含着比较强烈的情感,在一定程度上代表了用户的情感。然而现有的文本倾向性分析主要集中在产品评论、新闻报道等领域,目前针对微博新词的倾向性分析仍采用传统的方法,缺少对微博新词相关特征的分析,故而效果较差。本文的主要研究工作包含以下三个方面:第一,本文设计并实现了一种基于重复串统计的方法抽取候选新词,使用广义后缀树抽取所有可能的候选词串。第二,本文提出了一种语言规则与统计结合的新词检测算法对候选新词进行过滤。本文对比了几种经典统计量在新词检测时的表现,最终选择互信息作为内部统计量、左右邻接信息熵作为外部统计量。本文还在对普通新词与情感新词的区分上进行了思考与分析。第三,最后,本文在实践的基础上,提出了一种基于神经网络的新词情感判定算法。利用情感新词的上下文信息对情感词进行极性判定,使用词向量来表征新词的语义与语法特征,该方法有效地结合了局部上下文与全局上下文信息。本文采用多原语言模型通过对上下文聚类来确定词汇的多义词向量,以此对新词进行语义上的解析,判定其在不同上下文中的情感倾向。
[Abstract]:In the age of social networks sweeping the world, many new words and even new emojis have emerged. They often come along with the social hot news, just like the vane of online public opinion. How to effectively extract new words from massive Weibo information and carry out emotional analysis plays an important role in the topic tracking and public opinion analysis of Weibo content. These neologisms contain strong emotions, which to some extent represent the feelings of the user. However, the existing text orientation analysis mainly focuses on the field of product review, news report and so on. At present, the traditional method is still used for the tendency analysis of Weibo neologisms, and the analysis of the related features of Weibo neologisms is lacking, so the effect is poor. The main research work of this paper includes the following three aspects: first, this paper designs and implements a method based on repeated string statistics to extract candidate new words, and uses generalized suffix tree to extract all possible candidate strings. Secondly, this paper proposes a new word detection algorithm based on the combination of language rules and statistics to filter the candidate neologisms. This paper compares the performance of several classical statistics in neologism detection, and finally chooses mutual information as internal statistic and left and right adjacency information entropy as external statistic. This paper also thinks and analyzes the distinction between ordinary neologisms and emotional neologisms. Thirdly, on the basis of practice, this paper proposes a new word emotion decision algorithm based on neural network. Using the context information of emotion new words to judge the polarity of emotion words, the word vector is used to represent the semantic and grammatical features of the new words. This method combines the local context and the global context information effectively. In this paper, a multi-source language model is used to determine the polysemous vector of words by clustering the context, and then the semantic analysis of new words is carried out to determine their affective tendency in different contexts.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1

【参考文献】