社交网络下的垃圾信息过滤技术的研究

发布时间：2018-09-18 19:48

【摘要】：随着Web2.0技术的发展，社交媒体已经成为最大最活跃的社交平台，为数亿用户提供优质且良好的沟通渠道。然而，当人们在社交网络分享、交流、互动的同时，垃圾信息却也不断的膨胀。当前迫切需要一种技术来净化网络空间，营造健康的社交生态体系，因此，社交网络下的垃圾信息过滤技术已经成为研究者们普遍关注的热点问题。基于机器学习的分类技术广泛应用在社交平台垃圾信息过滤上，机器学习方法具有准确率高、成本低等特点。本文以新浪微博为研究对象，主要的研究内容分为如下几个部分：首先，从社交网络服务的信息传播形态出发，，分析了微博平台中垃圾信息的传播规律，设计了基于机器学习方法的过滤技术来识别新浪微博网络中的可疑账户，并实现了基于逻辑回归、支持向量机、随机森林三种模型的垃圾信息过滤系统。其次，从微博账户中提取多种具有区分性的特征，应用机器学习模型进行分类过滤。从用户行为以及内容行为两方面提取垃圾微博的属性特征，使用社交网络关系图来分析微博平台的数据流动和传播规律。以微博中的消息为主体构建信息传播图来描述用户之间的亲密度。最后通过数据分析以及实验测试评价了整个过滤系统的性能。再次，从系统的实际应用角度出发，提出采用在线主动学习的方法过滤垃圾微博，主动学习方法不仅能够减少系统对于标注数据的需求量，降低系统时间复杂度，同时也能保证良好的过滤性能。最后，垃圾信息制造者非常热衷于劫持正常用户的账号给人刷粉、帮人转发。本文提出基于序贯概率比检验的方法来检测僵尸账号，僵尸账号检测系统能够有效检测出社会网络中的僵尸账号。
[Abstract]:With the development of Web2.0 technology, social media has become the largest and most active social platform, providing hundreds of millions of users with excellent and good communication channels. However, when people share, communicate and interact on social networks, spam is expanding. At present, we urgently need a kind of technology to purify the cyberspace and build a healthy social ecosystem. Therefore, the spam filtering technology under the social network has become a hot issue that researchers pay attention to. The classification technology based on machine learning is widely used in social platform spam filtering. Machine learning method has the characteristics of high accuracy and low cost. This paper takes Sina Weibo as the research object, the main research content is divided into the following several parts: first, from the social network service information dissemination pattern, has analyzed the garbage information dissemination rule in the Weibo platform, The filtering technology based on machine learning method is designed to identify suspicious accounts in Sina Weibo network, and a garbage information filtering system based on logical regression, support vector machine and random forest model is implemented. Secondly, a number of distinguishing features are extracted from Weibo account and classified by machine learning model. This paper extracts the attributes of spam Weibo from user behavior and content behavior, and analyzes the data flow and propagation rules of the Weibo platform by using the social network relationship graph. Taking the messages in Weibo as the main body, the information transmission graph is constructed to describe the user affinity. Finally, the performance of the whole filter system is evaluated by data analysis and experimental test. Thirdly, from the point of view of the practical application of the system, an online active learning method is proposed to filter garbage Weibo. The active learning method can not only reduce the demand for annotated data, but also reduce the time complexity of the system. At the same time, it can ensure good filtration performance. Finally, spammers are keen to hijack a normal user's account and feed it. This paper proposes a method based on sequential probability ratio test to detect zombie accounts. Zombie account detection system can effectively detect zombie accounts in social networks.
【学位授予单位】：哈尔滨理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.09;TP181

【引证文献】