微网络环境中谣言识别机制研究

发布时间：2019-02-16 05:58

【摘要】：微博、微信等社交平台的广泛应用缩短了信息传播周期、扩大了信息传播范围,使得谣言造成的影响与危害变得更大,如何识别、进而阻断谣言成为信息传播领域的热点问题。本文基于最大熵模型、改进的最大熵模型和谣言的爆炸性,构建了微网络环境中谣言信息的识别机制。本文主要进行了以下四项工作:第一,将最大熵模型用于谣言识别,并根据谣言的特点确定特征函数,设计实验的训练集,并在不同特征数量下进行了实验,找到了最适合谣言识别的特征数量。通过与支持向量机模型、BP-神经网络模型、贝叶斯模型和K-means算法的谣言识别结果的比较证明,基于最大熵模型的谣言识别准确率与贝叶斯模型和K-means算法相当,仍有改进空间。第二,改进了最大熵模型,提高了谣言识别的准确率。提出了一种新的样本构建方法:中心距离裁剪法,用来解决非平衡数据分类问题中的边界模糊和孤立样本的问题。该方法用带有权重的向量来表示每一条信息,并用向量之间的距离表示信息的相似度,利用样本信息到每一类信息中心的距离来定义孤立点,裁剪边界样本。该方法解决了原始样本孤立点多和边界模糊的问题。提出了一种全新的特征选择方法:差异计算法。该方法充分考虑到了特征出现次数对谣言识别的影响,也充分考虑了在谣言和非谣言两类信息中出现都较多的特征的参考价值较低这一问题,在此基础上计算每个特征的差异值fDC)(,并根据差异值对特征进行排序,选择差异值最大的n个特征用于谣言识别。同时,对最大熵模型的特征函数进行改进,使最大熵模型更适合谣言识别。在构建了基于改进的最大熵模型的谣言识别机制后,本文进行了谣言识别实验,在实验设计中,对训练集的选取进行了改进,并用中心距离裁剪法进行优化,通过实验找到了微网络环境中进行谣言识别的最佳特征数量。将改进后与改进前的最大熵模型实验结果进行了比较,并且与支持向量机模型、BP-神经网络模型、贝叶斯模型和K-means算法的谣言识别结果进行了对比。实验结果表明,通过优化的训练集和特征函数的谣言识别效果明显优于优化之前,并且识别准确率优于其他相关分类方法。第三,对于基于最大熵模型识别谣言结果中分类模糊的信息,基于谣言的爆炸性进行了进一步的识别。建立了谣言制造者和传播者之间的博弈模型以及谣言的on-Trust)ET(Explosi模型,并通过实验找到了传播广泛的谣言所具有的共同特点,即传播广泛的谣言爆炸性值在范围]795.0,695.0[内,因此,谣言的爆炸性值成为谣言识别的重要依据。
[Abstract]:The extensive application of Weibo, WeChat and other social platforms shortens the period of information dissemination, expands the scope of information dissemination, makes the influence and harm caused by rumors become greater, and how to identify and block rumors becomes a hot issue in the field of information dissemination. Based on the maximum entropy model, the improved maximum entropy model and the explosion of rumors, this paper constructs a mechanism for the identification of rumor information in micro-network environment. The main work of this paper is as follows: first, the maximum entropy model is applied to the rumor recognition, the feature function is determined according to the characteristics of the rumor, the training set of the experiment is designed, and the experiment is carried out under the different number of features. The number of features most suitable for rumour recognition has been found. By comparing the results of rumor recognition with support vector machine model, BP- neural network model, Bayesian model and K-means algorithm, it is proved that the accuracy of rumor recognition based on maximum entropy model is equivalent to that of Bayesian model and K-means algorithm. There is still room for improvement. Secondly, the maximum entropy model is improved to improve the accuracy of rumour recognition. In this paper, a new method of constructing samples: centroid distance clipping is proposed, which is used to solve the problem of fuzzy boundary and isolated samples in the problem of non-equilibrium data classification. In this method, each piece of information is represented by a vector with weights, the similarity of information is expressed by the distance between vectors, the outlier is defined by the distance from sample information to each kind of information center, and the boundary samples are clipped. This method solves the problem of multiple outliers and fuzzy boundaries of the original samples. A new feature selection method, the difference calculation method, is proposed. The method takes into account the influence of feature occurrence times on rumor recognition and the low reference value of the features which appear more frequently in both rumor and non-rumor information. On this basis, the difference value fDC) (, of each feature is calculated and sorted according to the difference value, and n features with the largest difference value are selected for rumor recognition. At the same time, the feature function of the maximum entropy model is improved to make the maximum entropy model more suitable for rumour recognition. After constructing the rumour recognition mechanism based on the improved maximum entropy model, this paper carries out a rumor recognition experiment. In the experiment design, the selection of training set is improved, and the center distance clipping method is used to optimize it. The best number of features for rumor recognition in micro-network environment is found through experiments. The experimental results of the improved maximum entropy model are compared with those of the support vector machine (SVM) model, the BP- neural network model, the Bayesian model and the K-means algorithm. The experimental results show that the effect of the optimized training set and feature function is obviously better than that before the optimization, and the recognition accuracy is better than that of other related classification methods. Thirdly, the fuzzy information in the rumour recognition based on the maximum entropy model is further identified. The game model between rumour maker and communicator and the on-Trust) ET (Explosi model of rumor are established, and the common characteristics of spreading rumors are found through experiments. Therefore, the explosive value of rumor becomes an important basis for rumor recognition.
【学位授予单位】：山东师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：G206

【参考文献】