基于虚拟坐标的社交网络距离估计改进算法研究

发布时间：2019-01-28 18:35

【摘要】：随着互联网技术的不断发展,社交网络的发展也如火如荼,衍生出一系列社交网络应用。这些应用大多以网络节点之间的距离信息为基础,社交网络中的距离是对用户之间逻辑关系的一种描述,与物理距离无关,一般使用节点间最短路径的长度表示。随着现代社交网络的规模日渐庞大,使用传统方法对网络中节点间距离进行精确计算所花费代价已难以承受。现有的大规模社交网络中节点间距离估计算法大多采用将社交网络嵌入低维空间的技术,也就是利用参考节点进行坐标分配的方法实现距离估计。但现有的方法中存在一个共性问题,在为普通节点分配坐标时,仅仅使用了参考节点中的一少部分用于坐标分配,影响了距离估计算法的精度。本文针对现有方法的不足进行研究,提出了基于坐标嵌入技术的距离估计算法SDE(ShortestPath Distance Estimation)。该算法选择将社交网络嵌入不受对称性约束的内积空间,使其能同时适用于无向图与有向图进行距离估计。为提高距离估计的精确度,该算法使用全部的参考节点信息为节点分配坐标,但这就存在会过多增加计算压力的问题。为了解决这一问题,本文提出了基于组矩阵分解的坐标分配算法RGMD(Robust Group Matrix Decomposition)。RGMD借鉴了鲁棒性主成分分析提取低秩矩阵、去除异常值的思想,将坐标分配过程转化为抽取低秩矩阵的优化问题。RGMD算法采用把所有参考节点分成若干组的方式,通过组与组之间并行计算完成对其它节点的坐标分配。这种方法将坐标分配算法的优化过程由对单一变量的优化转化为对一组变量的同时优化,不仅将坐标分配算法的计算压力从指数增加降低到常数倍增加,而且将坐标分配与坐标校正两个过程合二为一,进一步提高了距离估计算法的精确度。本文在6个开源的真实社交网络数据集上对算法的性能进行了验证,主要从算法的精确度、收敛速度以及对参数的敏感性等方面展开。除此之外,还实现了4个社交网络中常用的基于距离计算的应用,从应用的角度对算法的性能进行验证。实验结果表明,相较于已有的算法,本文提出的算法虽然时间开销略有增加,但在精确度方面有大幅度提升。
[Abstract]:With the continuous development of Internet technology, the development of social networks is in full swing, derived a series of social network applications. Most of these applications are based on the distance information between network nodes. The distance in social network is a description of the logical relationship between users, which is independent of physical distance, and is usually expressed by the shortest path length between nodes. With the increasing scale of modern social networks, the cost of accurately calculating the distance between nodes in the network by traditional methods has become unbearable. Most of the existing algorithms for estimating distance between nodes in large-scale social networks are based on the technique of embedding social networks into low-dimensional space, that is, using reference nodes to allocate coordinates to realize distance estimation. However, there is a common problem in the existing methods. Only a few of the reference nodes are used for coordinate assignment, which affects the accuracy of the distance estimation algorithm. In this paper, a distance estimation algorithm, SDE (ShortestPath Distance Estimation)., based on coordinate embedding technique, is proposed to study the shortcomings of the existing methods. The algorithm chooses to embed the social network into the inner product space which is not constrained by symmetry so that it can be used to estimate the distance between the undirected graph and the directed graph at the same time. In order to improve the accuracy of distance estimation, the algorithm allocates coordinates to the nodes using all the reference node information, but there is a problem that the calculation pressure will be increased too much. In order to solve this problem, a coordinate allocation algorithm based on group matrix decomposition (RGMD (Robust Group Matrix Decomposition). RGMD) is proposed in this paper, which uses robust principal component analysis (PCA) to extract low rank matrix and remove outliers. The coordinate assignment process is transformed into an optimization problem of extracting low rank matrix. RGMD algorithm divides all reference nodes into several groups and accomplishes coordinate assignment to other nodes by parallel computation between groups. This method transforms the optimization process of coordinate allocation algorithm from the optimization of a single variable to the optimization of a set of variables at the same time. It not only reduces the computational pressure of the coordinate allocation algorithm from exponential to constant times, In addition, coordinate assignment and coordinate correction are combined to improve the accuracy of the distance estimation algorithm. In this paper, the performance of the algorithm is verified on six open source real social network datasets, mainly from the accuracy, convergence speed and sensitivity to parameters of the algorithm. In addition, four commonly used distance computing applications in social networks are implemented, and the performance of the algorithm is verified from the application point of view. The experimental results show that, compared with the existing algorithms, the proposed algorithm increases the time cost slightly, but greatly improves the accuracy.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP393.09

【参考文献】