社会化媒体突发热点事件检测及其可信度分析方法研究
发布时间:2018-05-04 20:01
本文选题:突发热点事件 + 可信度分析 ; 参考:《哈尔滨工业大学》2013年硕士论文
【摘要】:近年来,社会化媒体获得高速发展,人们的意见表达空间得到了空前扩展。以微博为代表的社会化媒体正在成为很多热点事件的首发平台,如何快速发现和检测社会化媒体中的突发热点事件,对舆情分析等应用来说至关重要。同时在社会化媒体中,捕风捉影、造谣生事的情况时常发生,造成恶劣影响。对社会化媒体中的事件进行可信度评估并识别网络谣言,可以降低其不良影响,维护经济和社会稳定。 目前,突发热点事件的检测主要通过检测热词来发现事件,在实际应用中往往存在着将周期性突发事件和短时间内集中发布的广告误识为突发热点的问题;在社会化媒体事件可信度分析研究中,目前主要的分析方法有基于可信度排序和基于分类器判别两种思路,但大部分方法未考虑用户的观点和情感倾向性对谣言事件判别的作用。此外,对用户特征的挖掘也存在不足之处。 针对以上问题,本文对突发热点事件的检测及其可信度分析方法进行了研究。首先,本文设计实现了一种基于热词识别和原创度过滤的突发热点事件检测方法。首先利用微博的文本内容及其传播特性,挖掘出突发热词。然后对热词进行聚类,形成高度相关的簇,从而发现突发热点事件。此外,本文提出利用话题原创度为主要特征,对在内容和传播规律上酷似热点的广告类事件进行过滤的方法,有效提高了突发热点事件检测的精度。在此基础上,本文研究了基于特征挖掘的事件可信度分析和谣言检测方法。针对检测到的突发热点事件,,通过利用事件在文本内容、发表用户特征、话题以及在社会化媒体中的传播特性等特征,构造分类器发现虚假谣言事件。 本文的主要贡献包括:第一,本文设计实现了一种利用回顾窗口,综合考虑词语的词频及其增长速度进行热词识别的方法,有效改善了周期性事件误检的问题;第二,本文提出和设计话题原创度指标,并用于对应用环境中常见的广告事件进行过滤,提高了突发热点事件检测准确率;最后,本文提出的利用多视角特征进行事件可信度分析的方法,可以较好地检测社会化媒体中的谣言。文中提出的一系列谣言事件判别特征对相关领域的研究也有很好的促进作用。
[Abstract]:In recent years, with the rapid development of social media, people's opinion expression space has been expanded unprecedented. Social media, represented by Weibo, is becoming the starting platform for many hot events. How to quickly detect and detect unexpected hot events in social media is very important to the application of public opinion analysis. At the same time, in the social media, speculation, rumour-making things often occur, causing adverse effects. Evaluating the credibility of events in social media and identifying online rumors can reduce its adverse effects and maintain economic and social stability. At present, the detection of hot spots is mainly through the detection of hot words to find events, in practical applications, there is often a periodic emergency and a short period of time to focus on the issue of advertising issued as hot spots; In the research of reliability analysis of social media events, the main analysis methods are based on credibility ranking and classifier discrimination. However, most of the methods do not consider the role of user's viewpoint and emotional tendency in judging rumor events. In addition, the mining of user features also has shortcomings. In order to solve the above problems, this paper studies the detection and reliability analysis of hot spots. Firstly, this paper designs and implements a hot spot detection method based on hot word recognition and originality filtering. First of all, by using Weibo's text content and its spreading characteristics, the burst hot words are excavated. Then the hot words are clustered to form a highly relevant cluster, and then the sudden hot events are discovered. In addition, this paper proposes a method of filtering advertising events which closely resemble hot spots in content and propagation law by using topic originality as the main feature, which can effectively improve the accuracy of the detection of sudden hot spots. On this basis, this paper studies the event reliability analysis and rumor detection method based on feature mining. Based on the features of events in text, user features, topics and propagation in social media, a classifier is constructed to detect false rumors. The main contributions of this paper are as follows: first, this paper designs and implements a method to identify hot words by taking into account the word frequency and growth rate of words, which can effectively improve the problem of periodic event misdetection. This paper puts forward and designs the index of topic originality, and it is used to filter the common advertisement events in the application environment, which improves the detection accuracy of sudden hot events. In this paper, the method of event reliability analysis based on multi-view features can detect rumors in social media. A series of discriminant features of rumour events proposed in this paper also contribute to the research of related fields.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP393.09
【参考文献】
相关期刊论文 前1条
1 郑斐然;苗夺谦;张志飞;高灿;;一种中文微博新闻话题检测的方法[J];计算机科学;2012年01期
本文编号:1844432
本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/1844432.html