k核心子图查询算法研究

发布时间：2018-04-23 17:34

本文选题：k核心子图 + 全局搜索　；参考：《燕山大学》2016年硕士论文

【摘要】：给定图G、查询结点v以及用户指定的k值,k核心子图查询用于从G中返回包含结点v且任意结点的度均大于或者等于k的一个子图。k核心子图主要应用于朋友推荐、社交网络中的广告宣传、疾病控制和语义扩张等方面。在此从以下几个方面对k核心子图查询问题进行了深入研究。首先,通过深入分析现有的k核心子图查询处理方法,发现现有算法在生成k核心子图时存在冗余遍历问题。其次,提出对图进行预处理的pre_CST算法。该算法对每个结点的邻居结点按度数进行排序,同时记录邻居中度数大于或者等于k的结点数目,以便为后续k核心子图求解提供便利。再次,提出高效的k核心子图求解算法CST。该算法充分利用预处理得到的信息,提出三种避免冗余遍历的策略,包括(1)当遍历当前结点的邻居结点时,如果发现某个邻居结点的度数小于k,那么对该结点之后的所有结点无需进行访问;(2)当邻居中度数大于或者等于k的结点数目小于k时,当前结点不会加入到候选子图中;(3)利用优先队列对要加入到k核心子图中的候选结点进行排序,优先加入与当前k核心里有最多关联边的候选结点,从而减少k核心子图查询时无效结点的加入。以上三种策略都能避免对无用结点的处理,从而减少冗余遍历,提高查询效率。最后,基于真实数据集,通过不同评价指标,对提出算法的高效性进行了验证。
[Abstract]:Advertising, disease control and semantic expansion in social networks. In this paper, the query problem of k-core subgraph is studied from the following aspects. Firstly, by analyzing the existing query processing methods of k-core subgraph, it is found that the existing algorithm has redundant traversal problem in generating k-core subgraph. Secondly, the pre_CST algorithm is proposed to preprocess the graph. In this algorithm, the neighbor nodes of each node are sorted by degrees, and the number of nodes whose degree is greater than or equal to k in the neighbor is recorded, so as to facilitate the solution of subsequent k-core subgraphs. Thirdly, an efficient k-core subgraph solution algorithm, CST, is proposed. Taking full advantage of the preprocessing information, the algorithm proposes three strategies to avoid redundant traversal, including 1) when traversing the neighbor nodes of the current node. If it is found that the degree of a neighbor node is less than k, then all nodes after that node do not need to be accessed) when the number of nodes in the neighbor whose degree is greater than or equal to k is less than k, The current node will not be added to the candidate subgraph.) priority queue is used to sort the candidate node to join the k-core subgraph, and the candidate node with the most associated edges in the current k-core is added first. In order to reduce the k-core subgraph query invalid nodes added. The above three strategies can avoid the processing of useless nodes, thus reducing redundant traversal and improving query efficiency. Finally, based on the real data set, the efficiency of the proposed algorithm is verified by different evaluation indexes.
【学位授予单位】：燕山大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：O157.5;TP391.3

【相似文献】