基于改进主题模型的微博短文本情感分析的研究
发布时间:2018-05-14 19:46
本文选题:主题模型 + 情感分析 ; 参考:《东南大学》2017年硕士论文
【摘要】:随着Web2.0以及社交媒体的发展,人们越来越广泛地使用互联网来发布和分享信息。其中,用户大部分产生的内容为短文本信息,如微博、商品评论等。这些微博短文本信息的长度虽短,但规模大、更新快,且蕴含着大量的个人情感表达信息,挖掘这些信息对舆情检测、用户分析、商品信息分析等领域都有重要的意义。然而,从微博短文本这类数据中挖掘出带有主观情感色彩的主题并非易事。短文本的内容非常稀疏,上下文信息严重不足,且通常包含很多错别字、新生词等。这导致现有的基于主题模型的方法,都不能挖掘出短文本中高质量的带有主观情感色彩的主题。针对以上问题,本文提出两个面向微博短文本的情感主题模型,旨在挖掘出短文本中高质量的带有主观情感色彩的主题。具体而言,本文的主要工作和贡献如下:(1)提出一种联合时间和用户信息建模的情感主题混合模型,即时间用户情感模型(Time-User Sentiment Latent Dirichlet Allocation,TUS-LDA)。它将同一时间下或同一用户发出的帖子聚合成一个伪的长文档,丰富上下文信息,一定程度上缓解了短文本数据稀疏的问题,挖掘出高质量的情感相关的主题信息。(2)提出一种结合时间、用户和hashtag信息建模的情感主题混合模型,即微博情感模型(WeiboSentimentModel,WSM)。该模型扩展 TUS-LDA 模型,利用 hashtag 带来的语义知识,进一步丰富上下文信息。(3)本文通过在3个真实数据集上的多个的实验对比,评估了 7个模型在挖掘情感相关主题和情感分类上的效果。其中本文提出的两个模型TUS-LDA和WSM都优于其他5个对比模型,WSM的性能又比TUS-LDA略好。TUS-LDA和WSM挖掘到了高质量的情感相关的主题,对商品情感分析和舆情分析有重大的帮助。(4)设计和实现了以WSM为核心的微博情感分析系统WSAS。
[Abstract]:With the development of Web2.0 and social media, more and more people use the Internet to publish and share information. Among them, most of the content generated by users is short text information, such as Weibo, commodity reviews and so on. Although the length of these Weibo short texts is short, the information is large, updated quickly, and contains a large amount of personal emotional expression information. Mining these information is of great significance to public opinion detection, user analysis, commodity information analysis and other fields. However, it is not easy to extract subjective themes from data such as Weibo essays. The content of short text is very sparse, the context information is seriously insufficient, and usually contains a lot of wrong words, new words and so on. As a result, none of the existing methods based on topic model can mine the high quality subject with subjective emotion in short text. In view of the above problems, this paper proposes two affective subject models for short text of Weibo, which aims to find out the theme of high quality and subjective emotion in short text. Specifically, the main work and contributions of this paper are as follows: 1) A hybrid emotional subject model, Time-User Sentiment Latent Dirichlet location (TUS-LDAA), is proposed, which combines time and user information modeling. It aggregates posts issued at the same time or by the same user into a long pseudo-document, which enriches the context information, and to some extent alleviates the problem of sparse data in short text books. (2) A hybrid model of emotion theme based on time, user and hashtag information is proposed, that is, the Weibo emotion model is WeiboSentification Model (WSM). This model extends the TUS-LDA model and further enriches context information by using the semantic knowledge brought by hashtag. The effects of 7 models on emotion related themes and emotion classification were evaluated. The two models proposed in this paper, TUS-LDA and WSM, are better than the other five contrast models. TUS-LDA and WSM mining high quality affective related themes. It is of great help to commodity emotion analysis and public opinion analysis. (4) A Weibo emotional analysis system based on WSM is designed and implemented.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前1条
1 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期
,本文编号:1889222
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1889222.html