基于社交网络节点特性的链路预测算法研究

发布时间：2018-03-18 18:09

本文选题：社交网络　切入点：链路预测　出处：《新疆大学》2017年硕士论文　论文类型：学位论文

【摘要】：如今计算机技术飞速发展,网络技术改变了人们的生活。各种社交网站的流行及智能终端的普及,使人们在虚拟的网络中留下了海量的、真实的数据,这为网络分析研究提供了数据基础。随着社交网络的不断壮大,人们在享受丰富的数字生活带来便捷的同时也感受到了信息膨胀带来的烦恼,即人们无法在海量数据中快速有效地提取的最相关的信息价值。对网络中可能存在或者已存在但未被发现的链接进行预测,有利于分析网络数据的缺失、分析复杂网络演化机制等问题,该研究在社交网络分析中具有重要意义。根据网络中已有的链接信息预测将来可能产生的链接,是网络分析的基本问题,在商业社交应用界也被广泛需求。目前,主要有基于局部信息相似性、基于路径和基于随机游走三类链路预测算法。基于局部信息相似性的算法运算简单,算法复杂度低,预测精度也相对较低,是目前运用最广泛的链路预测方法。基于路径和基于随机游走的方法较复杂,算法的时间复杂度高,对实际社交网络的应用性低。现有的基于局部相似性的链路预测方法,多数仅使用节点度进行预测,没有体现出重要节点度的作用,因此使算法的预测效果降低。提升链路预测精度是复杂网络研究的基础问题之一。本文对社交网络中链路预测问题进行了研究,主要工作和贡献如下:1、一些具有重要作用的节点可能具有更大的影响力或者更强的信息传播能力,现有的基于节点相似的链路预测指标没有充分利用网络节点的重要性,针对该问题提出基于节点重要性的链路预测算法,新的算法在共同邻居、Adamic-Adar、Resource Allocation相似性算法的基础上,充分利用了节点度中心性、接近中心性及介数中心性的信息,提出考虑节点重要性的CN、AA、RA链路预测相似性算法。改进的算法在4个真实数据集上进行仿真实验,以AUC值作为链路预测精度评价指标,实验结果表明,改进的算法在4个数据集上的链路预测精度均高于对比算法,能够对复杂网络结构产生更精确的分析预测。2、通过挖掘节点的邻居节点间的深层次的相互作用,对共同邻居节点所携带的网络二级结构拓扑信息进行过滤,提出节点聚类能力的计算方法。3、现有的基于局部信息相似性的链路预测指标没有充分考虑节点的聚类信息,在传统网络节点信息分类的基础上,通过网络中更稳定的三角形关系对二级拓扑结构信息进行过滤,提出基于节点聚类能力的链路预测方法。新的算法充分使用了共同邻居节点的聚类信息,使聚类能力强的节点在链路预测过程中发挥更大的作用。改进的算法在四类真实数据集上以MATLAB为仿真工具进行实验,在各数据集上以AUC和Precision为指标,改进的算法能够产生更准确的预测结果。
[Abstract]:Nowadays, with the rapid development of computer technology, network technology has changed people's lives. The popularity of various social networking sites and the popularity of intelligent terminals make people leave massive, real data in the virtual network. This provides a data base for network analysis research. As social networks continue to grow, people enjoy the convenience of a rich digital life and feel the annoyance of information inflation. That is, the most relevant information value that people can not extract quickly and effectively from the massive data. It is helpful to analyze the missing of network data by predicting the links that may exist or exist but are not found in the network. Analyzing the evolution mechanism of complex networks is of great significance in the analysis of social networks. It is the basic problem of network analysis to predict the possible links in the future according to the existing link information in the network. At present, there are three kinds of link prediction algorithms based on local information similarity, path and random walk. The algorithm based on local information similarity is simple, and the algorithm complexity is low. The prediction accuracy is also relatively low, which is the most widely used link prediction method. The method based on path and random walk is more complex, and the time complexity of the algorithm is high. Most of the existing link prediction methods based on local similarity only use node degree to predict, which does not reflect the function of important node degree. Therefore, the prediction effect of the algorithm is reduced. Improving the link prediction accuracy is one of the basic problems in the research of complex networks. In this paper, the link prediction problem in social networks is studied. The main work and contributions are as follows: 1. Some important nodes may have greater influence or greater ability to disseminate information. Existing link prediction indicators based on similar nodes do not take full advantage of the importance of network nodes. In order to solve this problem, a link prediction algorithm based on node importance is proposed. Based on the common neighbor Adamic-Adarresource Allocation similarity algorithm, the new algorithm makes full use of the information of node centrality, proximity centrality and intermediate centrality. A link prediction similarity algorithm considering node importance is proposed. The improved algorithm is simulated on four real data sets, and the AUC value is used as the evaluation index of link prediction accuracy. The experimental results show that, The improved algorithm has higher link prediction accuracy on the four data sets than the contrast algorithm, which can produce more accurate analysis and prediction of complex network structure. The network secondary structure topology information carried by the common neighbor node is filtered, and the calculation method of node clustering ability is proposed. The existing link prediction index based on the similarity of local information does not fully consider the clustering information of the node. On the basis of the traditional network node information classification, the secondary topology information is filtered through the more stable triangle relation in the network. A link prediction method based on node clustering ability is proposed. The new algorithm makes full use of the clustering information of common neighbor nodes. The improved algorithm is tested on four kinds of real data sets using MATLAB as simulation tool and AUC and Precision as indexes on each data set. The improved algorithm can produce more accurate prediction results.
【学位授予单位】：新疆大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP393.09

【相似文献】