新浪微博中热点检测子网的选点策略
发布时间:2018-10-26 08:18
【摘要】:微博已经成为了信息交流和传播的流行手段,大量的社会事件都会在微博中传播,,检测微博中的热点事件也变得越来越重要。然而,微博热点事件检测面临着一些巨大的挑战。微博的用户数量庞大且用户相对比较活跃,使得微博中短时间内就可以产生大量的微博。对这些数量庞大的微博进行处理需要大量的计算能力。 该文在中国最大的微博服务商新浪微博平台上实现了一个热点检测系统。由于实时处理一段时间内产生的所有微博来检测热点事件在经济上是不可行的,文章采取了一种策略,即通过监控新浪微博中的一小部分微博用户的微博,实现在有限的资源下对热点事件进行检测。文章的主要研究目的是为通过监控子网节点实现热点事件检测的系统提供子网节点的选点算法。该文首先提出了热点事件的覆盖度的概念,并提出了一种针对覆盖所有样本热点事件的子网选点算法。通过对该算法的研究,针对其不足,该文又提出了节点的热点事件参与概率的概念,并据此提出了一种概率算法选择子网节点。考虑到监控子网节点微博的开销的差别,该文最后提出了节点开销的概念,并结合节点的热点事件参与概率,提出了一种最优化算法。该文一共收集了525个热点事件,其中294个热点事件作为训练集,231个热点事件作为测试集,并将提出的三种子网选点算法分别应用于该数据集。研究结果表明,相比于其它算法,最优化算法能够以更小的系统开销,检测到更多的热点事件,热点事件检测率为70%。
[Abstract]:Weibo has become a popular means of information exchange and communication, a large number of social events will spread in Weibo, the detection of hot events in Weibo has become more and more important. However, Weibo hot spot event detection is facing some huge challenges. Weibo has a large number of users and relatively active users, which can produce a large number of Weibo in a short period of time. Dealing with these large numbers of Weibo requires a lot of computing power. This paper implements a hot spot detection system on the platform of China's largest Weibo service provider Weibo. Since it is not economically feasible for Weibo to detect hot spot events in real time processing, the article has adopted a strategy, that is, by monitoring a small number of Weibo users in Sina. The detection of hot events is realized with limited resources. The main purpose of this paper is to provide a subnet node selection algorithm for the system that monitors the subnet nodes to realize the hot event detection. In this paper, the concept of coverage of hot spot events is proposed, and a subnet algorithm is proposed for covering all hot events in samples. Based on the research of the algorithm and its deficiency, this paper puts forward the concept of the participation probability of hot spot events of nodes, and then proposes a probability algorithm to select the nodes in subnets. Considering the difference of the overhead of monitoring subnet node Weibo, this paper proposes the concept of node overhead and proposes an optimization algorithm based on the participation probability of hot spot events. In this paper, a total of 525 hot spot events are collected, of which 294 are as training sets and 231 as test sets. The proposed algorithm is applied to the data set. The results show that compared with other algorithms, the optimization algorithm can detect more hot events with less system overhead, and the detection rate of hot spot events is 70%.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2295127
[Abstract]:Weibo has become a popular means of information exchange and communication, a large number of social events will spread in Weibo, the detection of hot events in Weibo has become more and more important. However, Weibo hot spot event detection is facing some huge challenges. Weibo has a large number of users and relatively active users, which can produce a large number of Weibo in a short period of time. Dealing with these large numbers of Weibo requires a lot of computing power. This paper implements a hot spot detection system on the platform of China's largest Weibo service provider Weibo. Since it is not economically feasible for Weibo to detect hot spot events in real time processing, the article has adopted a strategy, that is, by monitoring a small number of Weibo users in Sina. The detection of hot events is realized with limited resources. The main purpose of this paper is to provide a subnet node selection algorithm for the system that monitors the subnet nodes to realize the hot event detection. In this paper, the concept of coverage of hot spot events is proposed, and a subnet algorithm is proposed for covering all hot events in samples. Based on the research of the algorithm and its deficiency, this paper puts forward the concept of the participation probability of hot spot events of nodes, and then proposes a probability algorithm to select the nodes in subnets. Considering the difference of the overhead of monitoring subnet node Weibo, this paper proposes the concept of node overhead and proposes an optimization algorithm based on the participation probability of hot spot events. In this paper, a total of 525 hot spot events are collected, of which 294 are as training sets and 231 as test sets. The proposed algorithm is applied to the data set. The results show that compared with other algorithms, the optimization algorithm can detect more hot events with less system overhead, and the detection rate of hot spot events is 70%.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前1条
1 林小燕;;微博客流行的学理思考[J];新闻爱好者;2010年22期
本文编号:2295127
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2295127.html