基于微博的突发事件检测研究
发布时间:2019-03-14 10:15
【摘要】:微博作为新兴的社交网络媒体,以其传播快、时效性强、内容全面的优势成为突发事件信息快速聚集和传播的重要渠道。但指数增长的微博数据使得用户难以及时了解整个事件的细节信息,且微博自由化程度高,突发事件在微博上容易被恶意传播,给国家安全和社会稳定带来了极大的隐患。因此从海量微博中准确而高效地检测出突发事件具有重要的意义,不仅可以帮助用户实时获取重要的突发事件资讯,消除突发事件带来的恐慌心理,还能够协助应急管理机构实时把握突发事件的发展态势,合理地控制和引导舆论发展方向,为舆情应急管理提供决策信息支持。微博因噪声大、文本短小稀疏、不规范等特点给突发事件检测带来了挑战,本文通过分析突发事件发生时期的爆发特性,结合微博数据的特点,对以突发特征为中心的突发事件检测方法及其舆情热度分析进行了深入研究。突发事件检测上,首先在综合考虑词语的主题表达能力和突发性的基础上,引入参照时间窗机制,设计了词频、文档频率、话题标签Hashtag、词频增长率四类特征选择与计算方法,提出了基于动态阈值的突发主题词抽取算法,实验结果表明该方法可以准确的提取有效表征事件的突发主题词。然后提出了基于突发主题词和凝聚式层次聚类的突发事件检测算法。该算法以突发主题词作为突发特征,将微博文本表示为特征向量,引入微博事件三要素过滤策略保留高质量的微博,以Jaccard计算重合度作为相似度衡量标准构造微博文本相似度矩阵,使用凝聚式层次聚类算法实现了突发事件的检测。实验结果表明,突发事件检测方法达到了80%的准确率,验证了该方法的可行性和有效性。针对检测的突发事件,对微博用户网络特征和微博传播方式分析,从用户影响力和微博传播影响力两个视角提出了突发事件的舆情热度计算模型,并构造单位时间片进行舆情热度的时序变化分析,通过实例分析发现,该模型能够较准确的划分突发事件的舆情生命周期,从整体上了解突发事件的发展趋势及变化规律。
[Abstract]:Weibo, as a new social network media, has become an important channel for the rapid gathering and dissemination of emergency information with its advantages of fast dissemination, strong timeliness and comprehensive content. However, the exponential growth of Weibo data makes it difficult for users to know the details of the whole incident in time, and Weibo has a high degree of liberalization, and sudden events are easily spread maliciously on Weibo, which brings great hidden trouble to national security and social stability. Therefore, it is of great significance to detect emergencies accurately and efficiently from the mass of Weibo. It can not only help users to obtain important emergency information in real time, but also eliminate panic caused by emergencies. It can also help emergency management organizations to grasp the development of emergencies in real time, reasonably control and guide the direction of public opinion development, and provide decision-making information support for public opinion emergency management. Weibo has brought challenges to the detection of emergencies because of the characteristics of high noise, short and sparse text, non-standard and so on. This paper analyzes the burst characteristics of the burst period and combines the characteristics of Weibo data. In this paper, the method of emergency detection and the thermal analysis of public opinion based on the burst feature are studied in depth. In emergency detection, on the basis of comprehensive consideration of the topic expression ability and the outburst of words, the reference time window mechanism is introduced, and four kinds of feature selection and calculation methods are designed, such as word frequency, document frequency and topic label Hashtag, word frequency growth rate. A burst topic word extraction algorithm based on dynamic threshold is proposed. The experimental results show that this method can accurately extract the burst topic words which represent the event effectively. Then, a burst detection algorithm based on burst topic words and condensed hierarchical clustering is proposed. In this algorithm, burst topic words are used as burst features, Weibo text is represented as feature vector, and Weibo event three-factor filtering strategy is introduced to retain high-quality Weibo. The similarity matrix of Weibo text is constructed by using Jaccard computing coincidence degree as the similarity measure, and the detection of unexpected events is realized by using the condensed hierarchical clustering algorithm. The experimental results show that the accuracy of the method is 80%, which verifies the feasibility and effectiveness of the method. Aiming at the detected emergencies, this paper analyzes the characteristics of Weibo's user network and Weibo's mode of communication, and puts forward a public opinion heat calculation model from the perspectives of user's influence and Weibo's communication influence. A unit time slice is constructed to analyze the time series change of public opinion heat. It is found that the model can accurately divide the life cycle of public opinion of sudden events and understand the development trend and changing rule of sudden events as a whole.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.092;G206
,
本文编号:2439892
[Abstract]:Weibo, as a new social network media, has become an important channel for the rapid gathering and dissemination of emergency information with its advantages of fast dissemination, strong timeliness and comprehensive content. However, the exponential growth of Weibo data makes it difficult for users to know the details of the whole incident in time, and Weibo has a high degree of liberalization, and sudden events are easily spread maliciously on Weibo, which brings great hidden trouble to national security and social stability. Therefore, it is of great significance to detect emergencies accurately and efficiently from the mass of Weibo. It can not only help users to obtain important emergency information in real time, but also eliminate panic caused by emergencies. It can also help emergency management organizations to grasp the development of emergencies in real time, reasonably control and guide the direction of public opinion development, and provide decision-making information support for public opinion emergency management. Weibo has brought challenges to the detection of emergencies because of the characteristics of high noise, short and sparse text, non-standard and so on. This paper analyzes the burst characteristics of the burst period and combines the characteristics of Weibo data. In this paper, the method of emergency detection and the thermal analysis of public opinion based on the burst feature are studied in depth. In emergency detection, on the basis of comprehensive consideration of the topic expression ability and the outburst of words, the reference time window mechanism is introduced, and four kinds of feature selection and calculation methods are designed, such as word frequency, document frequency and topic label Hashtag, word frequency growth rate. A burst topic word extraction algorithm based on dynamic threshold is proposed. The experimental results show that this method can accurately extract the burst topic words which represent the event effectively. Then, a burst detection algorithm based on burst topic words and condensed hierarchical clustering is proposed. In this algorithm, burst topic words are used as burst features, Weibo text is represented as feature vector, and Weibo event three-factor filtering strategy is introduced to retain high-quality Weibo. The similarity matrix of Weibo text is constructed by using Jaccard computing coincidence degree as the similarity measure, and the detection of unexpected events is realized by using the condensed hierarchical clustering algorithm. The experimental results show that the accuracy of the method is 80%, which verifies the feasibility and effectiveness of the method. Aiming at the detected emergencies, this paper analyzes the characteristics of Weibo's user network and Weibo's mode of communication, and puts forward a public opinion heat calculation model from the perspectives of user's influence and Weibo's communication influence. A unit time slice is constructed to analyze the time series change of public opinion heat. It is found that the model can accurately divide the life cycle of public opinion of sudden events and understand the development trend and changing rule of sudden events as a whole.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.092;G206
,
本文编号:2439892
本文链接:https://www.wllwen.com/xinwenchuanbolunwen/2439892.html