基于多话题的大规模社会网络影响力最大化研究
发布时间:2018-07-22 14:36
【摘要】:近些年来,影响力最大化问题已经是数据挖掘领域炙手可热的研究方向,并且普遍应用于社会网络分析。然而,现有大部分研究在寻找最具影响力的种子节点的同时忽略了一个事实,那就是针对不同的话题,所选出的种子节点也是不同的。即使少部分现有研究考虑了话题因素,它们忽略了任何需要在网络中传播的商品或事件都是由多个话题组成的,只考虑单个话题是片面且不准确的。同时,网络中用户的兴趣往往也不是单一的,而正是用户的行为偏好直接决定了影响在社会网络中传播的结果。针对现有传播模型和算法在传播过程中未考虑多话题因素的缺陷,本文首先利用话题模型对文本信息进行建模,再将得到的话题分布结合到传统的独立级联模型(Independent Cascade Model)中,提出了多话题敏感的独立级联模型(Multi-Topic Sensitive Independent Cascade model,MTSIC模型)。利用MTSIC模型,可以帮助选出最贴近现实生活情况的种子节点。同时利用Topical HITS算法,获得用户的权威度和从众性,并融入到模型中,使模型更加准确。由于在网络上传播的商品或事件可能存在地理位置限制,因此地理位置信息也被当作一项重要因素加入到模型中。传统影响力最大化算法并不适用于多话题场景,因此提出多话题敏感的影响力最大化算法(Activation Nodes Similarity algorithm,ANS)。考虑到在大规模网络中进行影响力最大化分析是十分耗时的,因此本文提出了基于Spark的多话题敏感影响力最大化算法(Parallelization of Multi-Topic algorithm,PMT)并行算法以提高算法效率。由于传统评价度量未能体现多话题因素的重要性,因此本文提出了新的度量SIS来诠释影响力最大化算法效果。通过在数据集DBLP和Twitter上的实验结果显示,MTSIC模型可以更准确的模拟真实情况下节点的激活情况并且ANS算法可以找到在现实情况中更倾向于接受商品或事件并进行传播的种子节点。而PMT算法的高效性也被证明。从各个方面进行的实验结果证明了本文所提出的传播模型及算法是效且高效的。
[Abstract]:In recent years, the problem of maximization of influence has been a hot research direction in the field of data mining, and is widely used in social network analysis. However, most of the existing studies ignore the fact that the seed nodes selected are different for different topics while looking for the most influential seed nodes. Even though a small number of existing studies consider topic factors, they ignore that any commodity or event that needs to be spread in the network is composed of multiple topics, only considering that a single topic is one-sided and inaccurate. At the same time, the interest of users in the network is often not single, and it is the behavior preference of users that directly determines the results that affect the spread of social networks. Aiming at the defects of the existing propagation models and algorithms which do not consider the multi-topic factors in the propagation process, this paper first uses the topic model to model the text information, and then combines the topic distribution into the traditional Independent cascade Model. A multi-topic sensitive independent cascade model (MTSIC model) is proposed. The MTSIC model can be used to select the seed nodes closest to the real life conditions. At the same time, the Topical hits algorithm is used to obtain the authority and conformity of the user, and it is integrated into the model to make the model more accurate. Because the goods or events propagated over the network may have geographical location restrictions, geographical location information is added to the model as an important factor. The traditional influence maximization algorithm is not suitable for multi-topic scenarios, so a multi-topic sensitive influence maximization algorithm (ans) is proposed. Considering that the analysis of influence maximization in large-scale networks is time-consuming, a parallel algorithm of multi-topic sensitive influence maximization (PMT) based on Spark is proposed to improve the efficiency of the algorithm. Because the traditional evaluation measure can not reflect the importance of multi-topic factors, this paper proposes a new measure SIS to explain the effect of the influence maximization algorithm. The experimental results on DBLP and Twitter show that the MTSIC model can more accurately simulate the activation of nodes in real situations and ans algorithm can find seed nodes that are more likely to accept commodities or events and propagate in real situations. The efficiency of PMT algorithm has also been proved. The experimental results show that the proposed propagation model and algorithm are effective and efficient.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13;G206
本文编号:2137780
[Abstract]:In recent years, the problem of maximization of influence has been a hot research direction in the field of data mining, and is widely used in social network analysis. However, most of the existing studies ignore the fact that the seed nodes selected are different for different topics while looking for the most influential seed nodes. Even though a small number of existing studies consider topic factors, they ignore that any commodity or event that needs to be spread in the network is composed of multiple topics, only considering that a single topic is one-sided and inaccurate. At the same time, the interest of users in the network is often not single, and it is the behavior preference of users that directly determines the results that affect the spread of social networks. Aiming at the defects of the existing propagation models and algorithms which do not consider the multi-topic factors in the propagation process, this paper first uses the topic model to model the text information, and then combines the topic distribution into the traditional Independent cascade Model. A multi-topic sensitive independent cascade model (MTSIC model) is proposed. The MTSIC model can be used to select the seed nodes closest to the real life conditions. At the same time, the Topical hits algorithm is used to obtain the authority and conformity of the user, and it is integrated into the model to make the model more accurate. Because the goods or events propagated over the network may have geographical location restrictions, geographical location information is added to the model as an important factor. The traditional influence maximization algorithm is not suitable for multi-topic scenarios, so a multi-topic sensitive influence maximization algorithm (ans) is proposed. Considering that the analysis of influence maximization in large-scale networks is time-consuming, a parallel algorithm of multi-topic sensitive influence maximization (PMT) based on Spark is proposed to improve the efficiency of the algorithm. Because the traditional evaluation measure can not reflect the importance of multi-topic factors, this paper proposes a new measure SIS to explain the effect of the influence maximization algorithm. The experimental results on DBLP and Twitter show that the MTSIC model can more accurately simulate the activation of nodes in real situations and ans algorithm can find seed nodes that are more likely to accept commodities or events and propagate in real situations. The efficiency of PMT algorithm has also been proved. The experimental results show that the proposed propagation model and algorithm are effective and efficient.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13;G206
【参考文献】
相关硕士学位论文 前4条
1 贺人贵;基于话题的学术网络影响力最大化研究[D];华中科技大学;2012年
2 兰如钦;社会网络上的影响力最大化算法研究[D];北京交通大学;2011年
3 黎雷;社会网络影响力模型及其算法研究[D];北京交通大学;2010年
4 冯小军;社会网络环境下一种基于潜力的影响最大化算法[D];复旦大学;2010年
,本文编号:2137780
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2137780.html