基于主题模型和社区发现的微博热点事件检测研究
发布时间:2018-07-17 03:56
【摘要】:凭借简便快捷的信息生成机制和传播机制,微博这一新兴的社交网络服务媒体在Web2.0时代已无处不在。与传统媒体相比,在新闻事件的播报和传播上,微博更加及时高效。因而基于微博数据的热点事件检测成为近几年的一个研究热点。但微博的一些特性为微博热点事件检测任务带来挑战。首先,微博数据流中有大量无价值、无意义的“噪声”微博,如何有效的从微博数据流中将令人感兴趣的事件微博与大量“噪声”微博区分开是微博热点事件检测面对的首要挑战。其次,一条微博只有不超过140个字符,文本极其稀疏,且常常包含拼写和语法错误、混合语言文字等,这些都使传统的文本分析技术无法直接应用于微博事件检测。 本文首先研究了国内外现有微博热点事件检测相关技术,然后根据现有技术不足,在静态和动态两种类型的微博热点事件检测上进行了相关研究和扩展。在静态微博事件检测方面,本文提出一种基于主题模型和贝叶斯方法的文本分类方法在静态微博数据上检测事件微博,该方法将静态微博数据映射到主题空间表述,并挖掘主题与文本类型之间的关系,然后根据微博的主题类别属性是否为事件类判断该微博的类别属性。在动态事件检测方面,本文提出一种基于社区发现和图核计算的动态事件检测方法,该方法首先根据本文提出的一种动态事件词选取算法选取事件词;然后分时间片将动态实时微博数据流中的微博根据其所含事件词状态构建成微博语义图,每个时间片的微博语义图以微博博文为结点,以结点之间是否出现相同事件词为边,然后使用一种社区发现算法发现每个时间片微博语义图中的事件社区,并返回每个事件社区的关键结点微博作为该事件社区所反映事件的描述;本文还提出一种基于主题语义的编码方案为事件社区图中每个结点编制一个比特数组编码标签,得到新的带标签的事件社区图,最后应用一种图核算法,计算在相邻时间片的标签事件社区图的相似度,并根据计算结果匹配描述同一事件的事件社区,达到事件追踪的目的。本文以实时爬取的中文微博数据为实验数据,分别应用上述两种方法检测微博热点事件,实验结果表明,上述两种方法均能达到预期效果。
[Abstract]:With the convenient and fast mechanism of information generation and dissemination, Weibo, a new social network service media, has become ubiquitous in the era of Web 2.0. Compared with traditional media, Weibo is more timely and efficient in the broadcast and dissemination of news events. Therefore, hot spot event detection based on Weibo data has become a research hotspot in recent years. However, some features of Weibo bring challenges to Weibo hotspot event detection task. First, there are a lot of worthless, meaningless "noisy" Weibo in the Weibo data stream. How to effectively distinguish the interesting event Weibo from a large number of "noise" Weibo from the Weibo data stream is the primary challenge of Weibo hot event detection. Secondly, a Weibo has no more than 140 characters, the text is extremely sparse, and often contains spelling and grammar errors, mixed languages and so on, which make the traditional text analysis technology can not be directly applied to Weibo event detection. This paper first studies the existing Weibo hot spot event detection technologies at home and abroad, and then, according to the lack of the existing technology, we research and extend the static and dynamic Weibo hot spot event detection. In the aspect of static Weibo event detection, a text classification method based on topic model and Bayesian method is proposed to detect event Weibo on static Weibo data. This method maps static Weibo data to topic space representation. The relationship between the topic and the text type is mined, and then the category attribute of the Weibo is judged according to whether the subject category attribute of the Weibo is the event class. In the aspect of dynamic event detection, this paper proposes a dynamic event detection method based on community discovery and graph kernel computing. Firstly, this method selects event words according to a dynamic event word selection algorithm proposed in this paper. Then, the Weibo in the dynamic real-time Weibo data stream is constructed into a Weibo semantic map according to the status of the event words in the dynamic real-time Weibo data stream. The Weibo semantic map of each time slice is based on the Weibo blog as the node and the same event word as the edge between the nodes. Then a community discovery algorithm is used to find the event community in the Weibo semantic graph of each time slice and return the key node of each event community Weibo as the description of the event community reflected by the event community. In this paper, we also propose a coding scheme based on topic semantics to compile a bit-array coding tag for each node in the event community graph, and obtain a new tagged event community map. Finally, a graph accounting method is applied. The similarity of the tagged event community graph in the adjacent time slice is calculated, and the event community describing the same event is matched according to the calculated results to achieve the purpose of event tracking. In this paper, the Chinese Weibo data collected in real time are used as experimental data, and the two methods mentioned above are used to detect the hot spot events of Weibo. The experimental results show that the two methods can achieve the desired results.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
本文编号:2128903
[Abstract]:With the convenient and fast mechanism of information generation and dissemination, Weibo, a new social network service media, has become ubiquitous in the era of Web 2.0. Compared with traditional media, Weibo is more timely and efficient in the broadcast and dissemination of news events. Therefore, hot spot event detection based on Weibo data has become a research hotspot in recent years. However, some features of Weibo bring challenges to Weibo hotspot event detection task. First, there are a lot of worthless, meaningless "noisy" Weibo in the Weibo data stream. How to effectively distinguish the interesting event Weibo from a large number of "noise" Weibo from the Weibo data stream is the primary challenge of Weibo hot event detection. Secondly, a Weibo has no more than 140 characters, the text is extremely sparse, and often contains spelling and grammar errors, mixed languages and so on, which make the traditional text analysis technology can not be directly applied to Weibo event detection. This paper first studies the existing Weibo hot spot event detection technologies at home and abroad, and then, according to the lack of the existing technology, we research and extend the static and dynamic Weibo hot spot event detection. In the aspect of static Weibo event detection, a text classification method based on topic model and Bayesian method is proposed to detect event Weibo on static Weibo data. This method maps static Weibo data to topic space representation. The relationship between the topic and the text type is mined, and then the category attribute of the Weibo is judged according to whether the subject category attribute of the Weibo is the event class. In the aspect of dynamic event detection, this paper proposes a dynamic event detection method based on community discovery and graph kernel computing. Firstly, this method selects event words according to a dynamic event word selection algorithm proposed in this paper. Then, the Weibo in the dynamic real-time Weibo data stream is constructed into a Weibo semantic map according to the status of the event words in the dynamic real-time Weibo data stream. The Weibo semantic map of each time slice is based on the Weibo blog as the node and the same event word as the edge between the nodes. Then a community discovery algorithm is used to find the event community in the Weibo semantic graph of each time slice and return the key node of each event community Weibo as the description of the event community reflected by the event community. In this paper, we also propose a coding scheme based on topic semantics to compile a bit-array coding tag for each node in the event community graph, and obtain a new tagged event community map. Finally, a graph accounting method is applied. The similarity of the tagged event community graph in the adjacent time slice is calculated, and the event community describing the same event is matched according to the calculated results to achieve the purpose of event tracking. In this paper, the Chinese Weibo data collected in real time are used as experimental data, and the two methods mentioned above are used to detect the hot spot events of Weibo. The experimental results show that the two methods can achieve the desired results.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前1条
1 路荣;项亮;刘明荣;杨青;;基于隐主题分析和文本聚类的微博客中新闻话题的发现[J];模式识别与人工智能;2012年03期
,本文编号:2128903
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2128903.html