支持差分隐私保护模型的数据发布算法研究
发布时间:2018-03-12 06:19
本文选题:数据发布 切入点:隐私保护 出处:《大连海事大学》2017年硕士论文 论文类型:学位论文
【摘要】:进入信息时代后,众多的服务提供商积累了大量的用户数据。数据共享可以避免由于雪藏数据带来的浪费,但是共享的数据往往涉及用户的隐私。因此,数据发布过程中的用户隐私保护问题逐渐受到了学术界和工业界的关注。差分隐私保护模型以其优秀的表现,被应用到众多领域。本文研究满足差分隐私保护模型的数据发布算法,并针对数据类型的不同,提出不同的解决方案。本文的研究目的是在保证发布算法满足差分隐私保护模型的基础上,提高发布数据的准确性。针对数值型数据,本文提出了基于有序划分直方图的数据发布算法。该算法对直方图区间做第一重变换后,在保护直方图结构信息的前提下对区间顺序进行调整,借此消除了可能出现的"分区截断"问题。同时,本文使用基于贪婪思想的分区算法,综合考虑分区后添加的Laplace噪声对分区方案的影响。这避免了对分区个数的事先约定,提高了算法对不同数据集的适用性。本文提出的算法在保证发布的直方图满足差分隐私保护模型的基础上,提高了作用于该直方图的范围查询的准确性。针对用复杂图表示的关联数据,本文提出了基于层次随机图分治的关联数据发布算法。本文的出发点是用户对这类数据的兴趣点往往集中在社区内。所以,结合社区发现的相关知识,本文提出对复杂图中的边进行不同程度的差分隐私保护的策略。本文提出的算法基于分治的思想,对社区进行层次随机图构建,然后合并子层次随机图。这解决了复杂图规模较大时,原始算法效率低下的问题。本文提出的算法对复杂图进行分块处理,并对数据发布的隐私性和数据可用性进行了权衡。在实验环节,按照节点度的分布和最短路径分布等衡量标准,本文的算法更好地保持了原始复杂图的性质。
[Abstract]:After entering the information age, a large number of service providers have accumulated a large amount of user data. Data sharing can avoid waste caused by snow storage data, but the shared data often involves users' privacy. In the process of data release, the problem of user privacy protection has been paid more and more attention by academia and industry. It has been applied to many fields. In this paper, we study the data publishing algorithm which satisfies the differential privacy protection model, and aim at the different data types. Different solutions are proposed. The purpose of this paper is to improve the accuracy of published data on the basis of ensuring that the publishing algorithm meets the differential privacy protection model. In this paper, a data publishing algorithm based on ordered partitioning histograms is proposed. After the first transformation of histogram interval, the interval order is adjusted under the premise of preserving the histogram structure information. In this way, the problem of partition truncation may be eliminated. At the same time, this paper uses the partition algorithm based on greedy thought to consider the effect of the Laplace noise added after partition on the partition scheme, which avoids the prior agreement on the number of partitions. The algorithm proposed in this paper ensures that the published histogram satisfies the differential privacy protection model, and improves the applicability of the algorithm to different data sets. Improves the accuracy of the range query that acts on the histogram. This paper proposes an algorithm for publishing associated data based on hierarchical random graph partition and conquer. The starting point of this paper is that the user's interest in this kind of data is often concentrated in the community. In this paper, a differential privacy protection strategy for edge in complex graphs is proposed. Based on the idea of divide-and-conquer, the proposed algorithm is used to construct hierarchical random graph of community. Then the sub-hierarchical random graph is merged. This solves the problem of low efficiency of the original algorithm when the scale of complex graph is large. The algorithm proposed in this paper deals with the complex graph in blocks. In the experiment, according to the distribution of node degree and the distribution of the shortest path, the algorithm of this paper can better maintain the nature of the original complex graph.
【学位授予单位】:大连海事大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309
【参考文献】
相关期刊论文 前5条
1 张啸剑;邵超;孟小峰;;差分隐私下一种精确直方图发布方法[J];计算机研究与发展;2016年05期
2 夏小玲;刘慧艺;;面向数据流的差分隐私直方图发布[J];计算机与现代化;2016年02期
3 张啸剑;孟小峰;;基于差分隐私的流式直方图发布方法[J];软件学报;2016年02期
4 吴英杰;陈鸿;王一蕾;孙岚;;面向任意区间树结构的差分隐私直方图发布算法[J];模式识别与人工智能;2015年12期
5 熊平;朱天清;王晓峰;;差分隐私保护及其应用[J];计算机学报;2014年01期
,本文编号:1600293
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1600293.html