微博舆情分析系统的设计与实现
发布时间:2018-06-26 02:17
本文选题:微博舆情 + 微博API ; 参考:《华南理工大学》2015年硕士论文
【摘要】:随着互联网的高速发展,微博作为一个新兴的社交网络平台,得到了广泛的应用。在满足了人们的网络社交需求的同时,微博也成为了网络舆情发生与传播的重要场域,微博舆情分析的必要性也日益凸显。微博具有文本简短、传播较快、数据量大等特征,故而微博舆情分析与传统的网络舆情分析相比存在许多不同。本文对网络舆情分析的相关技术进行研究,结合微博的特点,设计并实现了一个微博舆情分析系统。本文中的微博舆情分析系统主要有这些特征:第一,通过调用微博API实现对微博数据的实时采集,同时应用微博过滤策略,对原始微博数据进行过滤;第二,调用NLPIR系统接口对微博文本进行中文分词与词性标注,并自建一个用户词典,扩展了用户在分词过程中的自主能力;第三,应用停用词过滤策略,从词汇词性、词汇长度、停用词表三个方面进行停用词过滤,剔除文本信息量较低的词汇;第四,对微博文本集合进行低频词过滤,在此基础上建立LDA主题模型,并以困惑度为评价标准,选择LDA主题模型的最优主题数,最后将微博文本表征为主题向量的形式;第五,以Jensen-Shannon距离作为相似性度量标准,对微博文本集合进行K-MEANS聚类分析,并考虑了LDA主题模型最优主题数与文本类别数之间存在的联系,将初始K值设置为与最优主题数相关。第六,从微博文本的主题概率分布的角度出发,提取微博文本聚类结果中的话题信息,并考虑了微博的时序特征,以微博数量和增长速度为标准,评判话题的热度并对其进行排序。
[Abstract]:With the rapid development of the Internet, Weibo, as a new social network platform, has been widely used. At the same time, Weibo has become an important field for the occurrence and dissemination of network public opinion, and the necessity of Weibo public opinion analysis has become increasingly prominent. Weibo has the characteristics of short text, fast transmission and large amount of data, so there are many differences between Weibo public opinion analysis and traditional network public opinion analysis. Based on the characteristics of Weibo, a Weibo public opinion analysis system is designed and implemented in this paper. The Weibo public opinion analysis system in this paper mainly has these characteristics: first, by calling the Weibo API to realize the real-time collection of Weibo data, at the same time, using the Weibo filtering strategy to filter the original Weibo data; second, Using NLPIR system interface, the Chinese word segmentation and part of speech tagging of Weibo text are carried out, and a user dictionary is built to expand the autonomous ability of the user in the process of word segmentation. Three aspects of stop word list are used to filter the stop word and eliminate the words with low text information. Fourthly, the low frequency word filter is carried out on the text set of Weibo. Based on this, the topic model of Weibo is established, and the degree of confusion is taken as the evaluation standard. Finally, the Weibo text is represented as the form of topic vector. Fifth, using Jensen-Shannon distance as similarity measure, K-MEANS clustering analysis of Weibo text set is carried out. Considering the relationship between the optimal number of topics and the number of text categories in the LDA topic model, the initial K value is set to be related to the optimal number of topics. Sixth, from the point of view of the topic probability distribution of Weibo text, the topic information is extracted from the clustering result of Weibo text, and the temporal characteristics of Weibo are considered, which is based on the quantity and growth rate of Weibo. Judge the heat of the topic and rank it.
【学位授予单位】:华南理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1;TP393.092
【参考文献】
相关期刊论文 前7条
1 龙树全;赵正文;唐华;;中文分词算法概述[J];电脑知识与技术;2009年10期
2 刘志明;刘鲁;;微博网络舆情中的意见领袖识别及分析[J];系统工程;2011年06期
3 刘群,张华平,俞鸿魁,程学旗;基于层叠隐马模型的汉语词法分析[J];计算机研究与发展;2004年08期
4 周钦强,孙炳达,王义;文本自动分类系统文本预处理方法的研究[J];计算机应用研究;2005年02期
5 罗杰;陈力;夏德麟;王凯;;基于新的关键词提取方法的快速文本分类系统[J];计算机应用研究;2006年04期
6 胡洁;;高维数据特征降维研究综述[J];计算机应用研究;2008年09期
7 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期
相关硕士学位论文 前4条
1 杨冠超;微博客热点话题发现策略研究[D];浙江大学;2011年
2 张岚岚;新浪微博的网络舆情分析研究[D];华东师范大学;2011年
3 任刚;面向学科相关性分析的文本关联规则挖掘技术研究[D];中南大学;2011年
4 鲁芳;多重文本数字水印技术研究[D];湖南大学;2005年
,本文编号:2068618
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2068618.html