面向微博短文本的情感新词发现与倾向性研究
[Abstract]:In the age of social networks sweeping the world, many new words and even new emojis have emerged. They often come along with the social hot news, just like the vane of online public opinion. How to effectively extract new words from massive Weibo information and carry out emotional analysis plays an important role in the topic tracking and public opinion analysis of Weibo content. These neologisms contain strong emotions, which to some extent represent the feelings of the user. However, the existing text orientation analysis mainly focuses on the field of product review, news report and so on. At present, the traditional method is still used for the tendency analysis of Weibo neologisms, and the analysis of the related features of Weibo neologisms is lacking, so the effect is poor. The main research work of this paper includes the following three aspects: first, this paper designs and implements a method based on repeated string statistics to extract candidate new words, and uses generalized suffix tree to extract all possible candidate strings. Secondly, this paper proposes a new word detection algorithm based on the combination of language rules and statistics to filter the candidate neologisms. This paper compares the performance of several classical statistics in neologism detection, and finally chooses mutual information as internal statistic and left and right adjacency information entropy as external statistic. This paper also thinks and analyzes the distinction between ordinary neologisms and emotional neologisms. Thirdly, on the basis of practice, this paper proposes a new word emotion decision algorithm based on neural network. Using the context information of emotion new words to judge the polarity of emotion words, the word vector is used to represent the semantic and grammatical features of the new words. This method combines the local context and the global context information effectively. In this paper, a multi-source language model is used to determine the polysemous vector of words by clustering the context, and then the semantic analysis of new words is carried out to determine their affective tendency in different contexts.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 周超;严馨;余正涛;洪旭东;线岩团;;融合词频特性及邻接变化数的微博新词识别[J];山东大学学报(理学版);2015年03期
2 杨阳;刘龙飞;魏现辉;林鸿飞;;基于词向量的情感新词发现方法[J];山东大学学报(理学版);2014年11期
3 张海军;刘战东;木妮娜;;基于逐层剪枝的中文高频重复模式快速提取算法[J];计算机科学;2014年05期
4 霍帅;张敏;刘奕群;马少平;;基于微博内容的新词发现方法[J];模式识别与人工智能;2014年02期
5 安艳辉;高双喜;刘宗敏;;基于BP网络的字符识别系统设计[J];河北省科学院学报;2012年01期
6 纪娟;;神经网络模型在财务风险预警中的应用[J];网络安全技术与应用;2011年01期
7 林自芳;蒋秀凤;;基于词内部模式的新词识别[J];计算机与现代化;2010年11期
8 王素格;李德玉;魏英杰;宋晓雷;;基于同义词的词汇情感倾向判别方法[J];中文信息学报;2009年05期
9 贺敏;龚才春;张华平;程学旗;;一种基于大规模语料的新词识别方法[J];计算机工程与应用;2007年21期
10 罗智勇;宋柔;;基于多特征的自适应新词识别[J];北京工业大学学报;2007年07期
相关硕士学位论文 前7条
1 杜振雷;面向微博短文本的情感分析研究[D];北京信息科技大学;2013年
2 苏其龙;微博新词发现研究[D];哈尔滨工业大学;2013年
3 萨合多拉·木巴拉克;基于条件随机域算法的哈萨克语基本形容词短语的识别[D];新疆大学;2013年
4 唐都钰;领域自适应的中文情感分析词典构建研究[D];哈尔滨工业大学;2012年
5 丁溪源;基于大规模语料的中文新词抽取算法的设计与实现[D];南京理工大学;2011年
6 刘利刚;中文名实体识别与新词发现技术研究[D];哈尔滨工业大学;2007年
7 崔世起;中文新词检测与分析[D];中国科学院研究生院(计算技术研究所);2006年
,本文编号:2468257
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2468257.html