基于隐马尔科夫理论及条件随机场的微博网络信息扩散模型

发布时间：2018-03-30 14:07

本文选题：微博网络　切入点：隐马尔科夫理论　出处：《华南理工大学》2014年硕士论文

【摘要】：随着社交网络的蓬勃发展，信息呈现爆炸式的增长，我们随之步入了大数据时代。为了更好地挖掘社交网络的潜在价值，众多学者都对之进行了各方面的研究。那么，如何充分地利用社交网络中的信息并对之进行有效地控制和引导？如何深入地了解信息的扩散机制？如何正确预测社交网络中的用户行为？对之行之有效的一个研究方向就是构建准确、可解又唯美的信息扩散模型。微博(Micro-blog)，作为一种新型的社交网络平台，有传统社交网络的共性也有其个性。目前，有关微博网络的信息扩散模型研究，综合考虑信息内容、用户及网络结构的研究屈指可数。再者，既考虑信息间的“竞争关系”又考虑“合作关系”，且基于统计概率的多信息扩散模型则基本没有。鉴于此，，本文提出了微博网络中基于隐马尔科夫理论的信息扩散模型(IDMBHMT)和基于条件随机场的多信息扩散模型(MIDMBCRF)。首先，本文综合研究了微博网络信息扩散的特点及影响因素、隐马尔科夫理论、条件随机场理论以及本文相关的特征函数定义方法（自动中文文本分类、用户相似度度量以及多信息交互的量化方法），构建了微博网络中基于隐马尔科夫理论的信息扩散模型(IDMBHMT)和基于条件随机场的多信息扩散模型(MIDMBCRF)；其次，本文使用METIS工具对微博用户关系网络进行子图划分，并基于子图进行模型的构建，以此优化模型的性能；再者，本文使用Junction tree算法将模型应用于用户的转发行为预测；最后，使用新浪微博API(Application Programming Interface)抓取实验数据进行仿真实验。实验分析了两个模型的性能影响因素：图划分技术提高了模型的性能，且当子图规模为48时，两个模型的性能达到峰值；“多信息交互”以平均43%的概率对MIDMBCRF模型的转发概率产生影响。在网络规模分别为120、240和400的情况下，将IDMBHMT模型及MIDMBCRF模型分别与基于逻辑回归的转发行为预测模型(RPMBLR)及基于LT(Linear-Threshold)的多信息扩散模型(MIDMBLT)进行性能对比，四个模型的性能由高到低排序为：MIDMBCRF、IDMBHMT、RPMBLR、MIDMBLT。本文所构建的两个信息扩散模型，不仅可以应用于用户行为预测和舆论引导，而且其研究成果对其他相关学科研究存在借鉴意义。
[Abstract]:With the vigorous development of social network and the explosive growth of information, we have stepped into the era of big data. In order to better tap the potential value of social network, many scholars have studied it in all aspects. How to make full use of the information in the social network and effectively control and guide it? How to understand the diffusion mechanism of information in depth? How to correctly predict the behavior of users in social networks? An effective research direction is to build an accurate, solvable and beautiful information diffusion model. Weibo, as a new type of social network platform, has both the commonness and individuality of traditional social networks. There are few studies on users and network structure. Furthermore, both the "competition" and "cooperation" among information are considered, and the multi-information diffusion model based on statistical probability is largely absent. In this paper, an information diffusion model based on hidden Markov theory (IDMBHMT) and a multi-information diffusion model based on conditional random field (CRF) in Weibo network are proposed. First of all, this paper synthetically studies the characteristics of Weibo network information diffusion and its influencing factors, hidden Markov theory, conditional random field theory and the definition method of feature function (automatic Chinese text categorization). Based on the user similarity measurement and the quantization method of multi-information interaction, the information diffusion model (IDMBHMT) based on hidden Markov theory and the multi-information diffusion model based on conditional random field (CRF) in Weibo network are constructed. In this paper, we use METIS tools to partition Weibo user relationship network subgraph, and build the model based on sub-graph to optimize the performance of the model. Furthermore, this paper uses Junction tree algorithm to apply the model to user forwarding behavior prediction. Using Sina Weibo API(Application Programming Interface to capture experimental data for simulation experiments. The performance factors of the two models are analyzed experimentally. The graph partition technique improves the performance of the model, and when the size of the subgraph is 48, The performance of the two models has reached a peak. The average probability of "multi-information interaction" affects the forwarding probability of the MIDMBCRF model with an average probability of 43%. When the network size is 120240 and 400, respectively, The IDMBHMT model and the MIDMBCRF model are compared with the forward behavior prediction model (RPMBLR) based on logical regression and the multi-information diffusion model (MIDMBT) based on LTL Linear-Threshold. the performance of the four models is ranked from high to low to:: MIDMBCR / MIDMBLT. The two information diffusion models constructed in this paper can not only be applied to user behavior prediction and public opinion guidance, but also can be used for reference in other related disciplines.
【学位授予单位】：华南理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.092

【参考文献】