面向数据发布的数据匿名技术研究

发布时间：2018-06-23 11:05

本文选题：数据匿名 + 数据隐私　；参考：《东南大学》2016年博士论文

【摘要】：数据匿名是一种基于模糊和隐藏的隐私保护技术。现有的数据匿名技术主要通过泛化、抑制等方式实现数据的匿名化,降低攻击者获取用户身份和敏感信息的概率,保护用户隐私,同时保证数据的真实性和可用性。目前,数据匿名主要用于解决数据发布、位置服务、社交网络和数据查询等应用中的隐私保护问题。其中,面向数据发布的数据匿名技术是现阶段实现数据安全共享的关键技术。然而,该技术在实际应用中仍然存在诸多不足。因此,论文研究面向数据发布的数据匿名技术,重点针对该技术在实际应用中的若干问题展开研究。当前面向数据发布的数据匿名技术在匿名高维度、高缺失和复杂关系数据等三个方面存在以下问题:首先,现有的数据匿名算法在处理高维数据时会遭遇"维度灾难",并丢失大部分信息,导致数据可用性趋近于零。其次,在处理缺失数据时,现有的匿名算法会出现缺失数据污染其他数据的现象,导致缺失数据比例增加,数据可用性大大降低。最后,现有的数据匿名技术在处理包含多条用户记录的复杂关系数据时,会出现模型适用性问题,导致匿名性降低或者严重的数据缺损。因此,需要设计新的数据匿名算法和模型,以应对以上挑战。为了解决上述问题,论文分别针对高维度、高缺失和复杂关系数据提出了相应的匿名发布技术:首先,针对高维数据,提出基于自然等价类的数据匿名算法和基于半划分的数据匿名算法,分别从利用自然等价类和兼容离群数据两方面优化匿名划分策略,降低数据缺损,提高数据可用性。其次,针对高缺失数据,提出基于聚类的缺失数据匿名算法和按列隔离的缺失数据匿名算法,通过按行和按列隔离策略,降低缺失数据污染,提高数据可用性。最后,针对复杂关系数据,提出(k,l)-多样化模型,通过同时约束关系和事务数据,保护用户隐私,并给出满足该模型的APA、PAA和1M-Generalization算法,通过结合关系和事务数据匿名技术,降低数据缺损,提高数据可用性。在上述理论研究的基础上,论文设计并开发了一套数据匿名发布和评估原型系统PPDPES,集成了高维度、高缺失和复杂关系数据匿名发布和评估功能,并给出了应用示例。与现有的研究工作相比,论文所提出的数据匿名技术更合理、适用范围更广、可用性更高,这对面向隐私保护的数据匿名发布具有重要意义。
[Abstract]:Data anonymity is a kind of privacy protection technology based on fuzzy and hidden. The existing technology of data anonymity mainly realizes the anonymity of data by means of generalization, suppression, etc., which reduces the probability of attackers to obtain user identity and sensitive information, protects user privacy, and ensures the authenticity and availability of data at the same time. At present, data anonymity is mainly used to solve privacy protection problems in applications such as data publishing, location services, social networks and data query. Among them, the technology of data anonymity for data publishing is the key technology to realize data security sharing. However, there are still many shortcomings in the practical application of this technology. Therefore, this paper studies the technology of data anonymity for data publishing, focusing on some problems in the practical application of this technology. The current data anonymity technology for data publishing has the following problems in three aspects: anonymous high dimension, high missing and complex relational data: first of all, The existing anonymous data algorithms will encounter a "dimensionality disaster" when dealing with high-dimensional data, and lose most of the information, resulting in data availability approaching zero. Secondly, when dealing with missing data, the existing anonymous algorithms will pollute other data by missing data, resulting in an increase in the proportion of missing data and greatly reduced data availability. Finally, the existing technology of data anonymity can deal with complex relational data containing multiple user records, which will lead to the problem of model applicability, which leads to the decrease of anonymity or serious data defect. Therefore, new data anonymity algorithms and models need to be designed to meet the above challenges. In order to solve the above problems, this paper proposes corresponding anonymous publishing techniques for high dimensional, high missing and complex relational data: firstly, for high dimensional data, This paper proposes a data anonymity algorithm based on natural equivalence class and a data anonymity algorithm based on semi-partitioning, which optimizes anonymous partitioning strategy by using natural equivalence class and compatible outlier data, which can reduce data defect and improve data availability. Secondly, for high missing data, clustering based missing data anonymous algorithm and column isolated missing data anonymous algorithm are proposed to reduce missing data pollution and improve data availability by row and column isolation strategy. Finally, aiming at the complex relational data, a (KKL) -diversification model is proposed, which protects the privacy of users by simultaneously constraining the relationship and transaction data, and gives the APAMA-PAA and 1M-Generalization algorithms to satisfy the model. By combining the relation and transaction data anonymity technology, this paper presents the algorithm of APAPAA and 1M-Generalization. Reduce data defect and improve data availability. Based on the above theoretical research, this paper designs and develops a prototype system of data anonymous publication and evaluation, PPDPES, which integrates the functions of anonymous publication and evaluation of high dimensional, high missing and complex relational data, and gives an application example. Compared with the existing research work, the technology of data anonymity proposed in this paper is more reasonable, the scope of application is wider, and the availability is higher, which is of great significance for the privacy protection oriented data anonymous publishing.
【学位授予单位】：东南大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP309

【相似文献】