基于向量模型的加权社会网络发布隐私保护方法研究

发布时间:2018-01-23 15:15

  本文关键词: 加权社会网络 隐私保护 向量模型 边空间 差分隐私 随机投影 出处:《江苏大学》2015年博士论文 论文类型:学位论文


【摘要】:社会网络是社会个体间因互动而形成的相对稳定的关系体系,是多种社会现象的表示模型,是复杂网络中最具代表性的现实网络之一。随着社交网络数量不断增加,越来越多的社会个体在社交网络注册,大量个体的信息被收集、获取。为满足科学研究、数据共享等需要,数据收集者需对社会网络数据集进行发布,由于数据集中包含个体的敏感信息,数据发布使个体的隐私面临泄露的风险。随着公众对隐私认知不断提高,隐私泄露问题已成为数据发布的主要障碍,为确保社会个体的隐私安全,在社会网络发布时需进行隐私保护处理。目前,已有的社会网络发布隐私保护研究成果大都针对无权网络。在无权网络中,社会个体间的连接属于布尔关系,只能说明个体间是否存在相互作用,对个体间相互作用的强度差异却无法标识。越来越多的社会网络实证研究表明,社会个体间存在着耦合强度不同的连接,并不完全是布尔关系。比如人与人之间的亲疏关系、Internet网络上的带宽、航空网络中机场间航班的数量或者座位数、科学家合作网络中科学家间的合作次数等都是影响网络性质的重要因素。因此,在社会网络的拓扑结构中引入衡量节点间耦合程度的物理量非常必要,即在两个节点间的连边上加一个权值,以衡量两个节点间关系的强弱。加权社会网络由于边权重引入,使得网络中包含的信息要比无权网络丰富,因此对加权社会网络发布的隐私保护进行研究是十分必要而有意义的工作。本文针对加权社会网络,提出采用基于向量模型的局部扰动策略设计隐私保护方法实现数据发布,具体内容如下:(1)提出了依据隐私保护质量和发布数据效用两个性能指标界定加权社会网络的发布场景,并给出了具体场景的定义。对社会网络发布实施隐私保护,先要确定发布场景,明确攻击者的背景知识、发布数据的用途和需要保护的隐私信息,才能采取有效的保护策略,设计隐私保护方法。针对社会网络发布,衡量隐私保护方法性能的两个重要指标是隐私保护质量和发布数据效用。根据发布数据的特性及实际的发布需求,数据发布者可能面临三种选择:是在获得可接受的隐私保护质量前提下,尽可能地提高发布数据效用;二是在获取可接受的发布数据效用前提下,尽可能地提高隐私保护质量;三是兼顾隐私保护质量和发布数据效用,希望取得二者的折中。本文针对三种发布选择确定了三个发布场景,在每个发布场景中,选取加权社会网络的节点(包括节点间连边的权重)作为隐私信息,发布数据的用途是进行网络结构特征分析(重点关注平均路径长度、平均聚类系数、权重分布),拟定攻击者分别拥有三种关于节点的背景知识(度、子图、边权重)。(2)提出了采用向量作为加权社会网络的发布模型。以图的边空间理论为基础,采用向量描述加权社会网络,为降低向量维数,采用基于节点的随机分割和聚类分割两种方法构建加权社会网络的向量模型。通过分割将加权社会网络表示为若干个子图,用向量表示每个子图,将所有子图的向量构成的集合作为加权社会网络的发布模型。分割子图与节点数量相同的稠密图相比属于稀疏图,通过对分割子图的向量进行扰动,实施对加权社会网络的局部扰动策略,进而实现加权社会网络发布的隐私保护。(3)针对提高发布数据效用的需求,提出了采用基于向量相似的随机扰动方法实现加权社会网络发布。该方法以加权欧氏距离作为向量相似的度量标准,根据发布者选定的阈值构建子图向量的发布候选集;从子图的向量候选集中随机选取向量构建加权社会网络的发布向量集;根据发布向量集构建最终的加权社会网络发布集。提出的方法能够迫使攻击者在一个向量发生概率相同的庞大结果集中进行重识别,增加了识别的不确定性和子图候选向量集中向量的相似性,进而最大限度地保证了发布社会网络与原始社会网络的相似性,提高发布数据效用。(4)针对提高隐私保护质量的需求,提出了采用基于差分隐私模型的向量映射方法实现加权社会网络发布。该方法利用差分隐私模型可实现隐私信息强保护的特征,针对加权社会网络设计了满足差分隐私的查询模型—WSQuery, WSQuery模型可捕获加权社会网络的结构,以有序三元组序列作为查询结果集;依据WSQuery模型设计了满足差分隐私的算法—WSPA,WSPA算法将查询结果集映射为一个实数向量,通过在向量中注入Laplace噪音实现隐私保护;针对WSPA算法误差较高的问题提出了改进算法—LWSPA,LWSPA算法对查询结果集中的三元组序列进行分割,对每个子序列构建满足差分隐私的算法,降低了误差,可满足发布数据的效用需求,提高隐私保护质量。(5)针对兼顾隐私保护质量和发布数据效用的折中需求,提出了采用基于随机投影的向量映射方法实现加权社会网络发布。该方法将加权社会网络用高维向量进行描述,应用随机投影技术的低失真映射将原始高维向量集进行降维操作得到低维目标向量集,通过数据降维即可实现去除冗余又可通过降维转换采用数值畸变方法实现隐私信息保护。在基本向量集随机投影方法的基础上,为避免随机投影转换矩阵泄露导致原始数据集被重构,提出了改进的向量集随机投影方法,采用了两个随机函数地组合构建随机矩阵元素,并证明了利用该矩阵实现地随机映射满足Johnson-Lindenstrauss引理的条件,该方法在提升隐私保护质量的同时能获得较高的发布数据效用,可实现隐私保护质量和发布数据效用的折中。(6)针对提出的基于向量模型的三种隐私保护方法在六个真实数据集上进行了仿真实验,并通过与已有算法进行实验对比,分析了每种方法的性能,验证了所提出方法的有效性。对基于三种隐私保护方法实现的算法执行时间进行了分析;选取了与提出的三种隐私保护方法相关的六个算法结合具体隐私攻击进行了实验对比,从基于度、基于子图和基于权重的三种背景知识的节点识别攻击测试了算法的隐私保护质量;从平均最短路径、平均聚类系数和权重分布三个结构特征参数的效用测试了算法的发布数据效用。根据实验结果及分析可知,提出的三种隐私保护方法可满足各自发布场景的需求,能较好地平衡隐私保护质量和发布数据效用的关系。
[Abstract]:The social network system is a relatively stable relationship between individuals because of social interaction to form the model is expressed in a variety of social phenomena, the reality of complex networks is one of the most representative. With the increasing number of social networks, more and more individuals registered in the social network, a large number of individual information is collected, access to meet the needs of scientific research, data sharing, data collectors need to be released to the social network data sets, the sensitive information of the data set contains individual data released, make individual privacy risk disclosure. As the public on cognitive privacy continues to improve, privacy issues have become a major obstacle to the release of the data, to ensure that the social individual privacy, the social network privacy protection is required when processing. At present, the existing social network privacy protection research mostly for free Right in the network. To the network, the connection between individuals belonging to Boolean relations, can only explain whether interactions exist between individuals, the intensity differences on the interaction between individuals is not identified. More and more empirical research on social network shows that social individual exists between the coupling strength of different connections, is not entirely the Boolean relations for example the relationship between man and man, Internet on the network bandwidth, the number of flights between airports in the aviation network or the number of seats, scientists cooperation network scientists cooperation times are the important factors influencing the quality of the network. Therefore, the introduction of a measure of physical coupling degree between nodes is necessary in topology the social network, which is in between two nodes even on the edge of an increase in weight, to measure the strength of the relationship between two nodes. The weighted social network because of the edge weight is introduced, making the net Network information contained in the rich than the unweighted network, the weighted social network privacy protection research is necessary and meaningful work. Based on the weighted social network, the local disturbance privacy protection method design strategy vector model to realize the data released based on the specific contents are as follows: (1) put forward the basis the quality and utility of the data privacy protection released the two performance indicators defined weighted social network publishing scene, and gives the definition of specific scenes. The social network privacy protection promulgated, first issued to determine the scene, clear the attacker's background knowledge, publication data and privacy information need to be protected, in order to take effective protection strategies the design of privacy protection method. According to the social network, two important indicators to measure the performance of privacy protection method is privacy protection and quality According to data released cloth data utility. The characteristics and the actual demand for the release of the data, the publisher may face three choices: in privacy protection under the premise of acceptable quality, as much as possible to improve the release of data utility; two is to obtain acceptable data utility under the premise, as far as possible to improve privacy protection quality; three is both privacy protection and data quality utility, hope to achieve the two compromise. According to the three release selection identified three release scenarios, released in each scenario, node selection weighted social network (including the node connected between the edge weight) as privacy information, publishing data use analysis the network structure (focusing on the average path length, clustering coefficient, weight distribution), to have three attackers respectively on the node of background knowledge (degree, subgraph, Bian Quanzhong (2)). The vector as the publishing model weighted social network. The edge space theory to as the basis, using the weighted vector to describe the social network, in order to reduce the dimension of vector, the vector model of random segmentation and clustering segmentation method for constructing a two node weighted based on the social network. By dividing the weighted social network is expressed as a number of sub graph. Each sub graph with the vector, the set of all sub graph vector constitute publication model weighted social network. The same number of sub graph segmentation dense graph with node compared to a sparse graph of disturbance by vector on the segmentation graph, the implementation of local perturbation strategy on weighted social network, and protect the privacy of the weighted the social network publishing. (3) in order to improve the utility of released data demand, put forward by the random perturbation method to realize vector similarity based on weighted social network Network release. In this method the weighted Euclidean distance as the standard vector similarity measure, according to the release candidate publisher selected threshold to construct the sub vector; vector candidate subgraphs from randomly selected vector to construct a weighted social network released according to the published set of vector set; weighted social network construction to the final set. The method can force the attacker in a vector of the same probability results occurred huge re recognition, increased uncertainty and similarity subgraph candidate vector set vector recognition, and to maximize the similarity with the original release of social network social network, improve the data released (4) in order to improve the effectiveness. The quality of privacy protection needs, put forward the vector mapping method for differential privacy model for weighted social network based on the method. By using the difference of privacy The model can realize the characteristics of strong privacy information protection, according to the weighted social network designed to meet the WSQuery query model: privacy, WSQuery model can capture the weighted social network, to order three tuple sequence as the query result set; according to the WSQuery model, designed to meet the differential privacy algorithm - WSPA algorithm, WSPA query the result set is mapped to a real vector, through the injection of Laplace noise in the vector to achieve privacy protection; WSPA algorithm for high error problem put forward to improve the algorithm LWSPA, LWSPA algorithm for the three tuple sequence set in the query results of segmentation, each sub sequence is constructed to meet the differential privacy algorithm reduces the error. Can meet the requirements of data release utility, improve the quality of privacy protection. (5) according to the needs of both compromise quality and privacy protection data released utility, puts forward the base Random projection vector mapping method to achieve weighted social network publishing. The weighted social network with high dimensional vector description, low distortion mapping the original high-dimensional vector set reduction operation to get the low dimensional target vector set using random projection technology, through the data dimensionality reduction can be realized by removing redundant and dimensionality reduction by using the numerical method to realize the conversion of the distortion of privacy protection. Based on random projection method set in the basic vector, in order to avoid the random projection transformation matrix led to disclosure of the original data set was reconstructed, put forward the improved method of random projection vector set, using two random function to construct random combination of matrix elements, and prove the realization of random meet the conditions of lemma Johnson-Lindenstrauss mapping by using this matrix, this method can obtain higher efficiency in data released while improving the quality of privacy protection Use, can realize the protection of privacy and utility of the released data quality trade-off. (6) put forward three kinds of privacy protection method based on vector model on six real data sets are carried out simulation experiments, and through the comparison and analysis of the existing algorithms, each method can verify the effectiveness this method. To realize the three kinds of privacy protection method based on the algorithm execution time is analyzed; selected six algorithms and three kinds of privacy protection methods proposed by combining the specific privacy attacks. The experimental results based on degree, based on sub graph and node identification attack three weights based on background knowledge test the quality of privacy protection algorithm; the average shortest path, clustering coefficient and average weight distribution of three structural characteristic parameters of the utility test algorithm to publish data utility. According to the experimental results and analysis It can be seen that the three methods of privacy protection can meet the needs of each release scene, and can better balance the relationship between the quality of privacy protection and the release of data utility.

【学位授予单位】:江苏大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:TP309

【参考文献】

相关期刊论文 前10条

1 韩建民;于娟;虞慧群;贾l,

本文编号:1457824


资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1457824.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户61f01***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com