大规模网络中抽样策略与应用研究

发布时间：2018-06-21 06:03

本文选题：复杂网络 + 网络抽样　；参考：《电子科技大学》2015年硕士论文

【摘要】：在复杂网络科学领域,人们通过对复杂网络的静态性质和动态特征的研究,提出了很多刻画复杂网络的指标和分析计算方法。然而,随着网络规模的不断增长,这其中有很多算法由于时空复杂度过高而不再适用于大规模网络上的计算。解决这个问题的方法之一便是利用网络抽样,在抽样数据集上以较低的资源消耗获得近似解。然而,现有的抽样算法大部分是基于单阶段抽样框架,往往难以精确刻画具有多重分布特性的复杂网络样本,无法满足具体领域研究的需要。本文在国内外相关研究基础上,针对大规模网络中的最短路径计算和大规模社交网络中的二元关系预测提出了网络抽样策略并设计了相应的快速算法。最短路径是很多网络分析算法的基础,然而经典算法由于复杂度过高,往往无法处理大规模的网络。我们在大规模网络的最短路径计算中引入了中心点抽样,提出了有效计算最短路径的近似算法。本算法利用复杂网络中普遍存在的等级结构,抽取网络中的若干中心节点并与周围的邻居节点进行凝聚形成超级节点,构成新的上层网络,并在上层网络上重复中心节点与邻居节点的凝聚过程,直到最上层网络中的节点个数低于某一阈值时结束。最后利用Dijkstra算法精确地计算最上层网络中节点间的最短路径,并根据凝聚得到的等级网络结构来估计原始网络中节点之间的最短距离。实验结果表明我们的算法在精度和效率上都明显优于当前现有的算法。在大规模社交网络中,用户之间的二元关系对社交网络分析有着很高价值。然而,随着网络规模的不断增加,用户之间的关系越来越复杂,也越来越隐蔽,如何高效准确地推断用户之间的二元关系逐渐成为现在的研究热点。在本文中,我们基于社会心理学理论提出了一种适合于大规模网络二元关系预测的算法。我们设计了分段训练框架将训练集分为两个子集,并在每个子集上分别训练SVM分类器。为了应对大规模训练数据带来的算法效率下降问题,我们引入了有效的抽样策略,利用不到原网络1%的数据样本,构建了具有高预测准确率的模型。我们与现有的算法以及它们的集成算法(AdaBoost)进行了比较,在公共标准数据集上的实验结果表明,我们的算法在大规模社交网络上的预测准确率更高。
[Abstract]:In the field of complex network science, through the study of static and dynamic characteristics of complex networks, many indexes and analytical methods are proposed to describe complex networks. However, with the increasing of network size, many of these algorithms are no longer suitable for large-scale networks due to their high space-time complexity. One of the methods to solve this problem is to use network sampling to obtain approximate solution on the sample data set with lower resource consumption. However, most of the existing sampling algorithms are based on the single-stage sampling frame, and it is often difficult to accurately depict complex network samples with multiple distribution characteristics, which can not meet the needs of the research in specific fields. On the basis of domestic and foreign research, this paper proposes a network sampling strategy and designs a fast algorithm for computing the shortest path in large-scale networks and predicting binary relationships in large-scale social networks. The shortest path is the basis of many network analysis algorithms, but the classical algorithms are often unable to deal with large-scale networks because of their high complexity. The center sampling is introduced into the calculation of the shortest path of the large-scale network, and an approximate algorithm for calculating the shortest path is proposed. Based on the hierarchical structure of complex network, this algorithm extracts some central nodes from the network and condenses with the neighboring nodes to form a super node to form a new upper layer network. The condensation process between central node and neighbor node is repeated on the upper layer network until the number of nodes in the upper layer network is lower than a certain threshold. Finally, Dijkstra algorithm is used to accurately calculate the shortest path between nodes in the upper layer of the network, and the shortest distance between nodes in the original network is estimated according to the hierarchical network structure. The experimental results show that our algorithm is superior to the existing algorithms in accuracy and efficiency. In large-scale social networks, the binary relationship between users is of great value to the analysis of social networks. However, with the increasing of network scale, the relationship between users is becoming more and more complex and covert. How to infer the binary relationship between users efficiently and accurately has gradually become a hot research topic. In this paper, based on the theory of social psychology, we propose an algorithm for predicting binary relationships in large-scale networks. We design a segmented training framework that divides the training set into two subsets and trains SVM classifiers on each subset. In order to deal with the problem of reduced algorithm efficiency brought about by large-scale training data, we introduce an effective sampling strategy and construct a model with high prediction accuracy by using less than 1% of the data samples from the original network. Compared with the existing algorithms and their ensemble algorithms AdaBoost. the experimental results on common standard datasets show that our algorithm has higher prediction accuracy on large-scale social networks.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：O157.5

【参考文献】