ERP信息系统数据发布的匿名化技术研究
发布时间:2018-09-01 12:50
【摘要】:随着互联网等信息技术的飞速发展,各个领域都蕴藏着海量的信息数据,如今各行各业进行数据收集、分析和挖掘的能力得到了很大提升,尤其是数据挖掘技术,有助于发掘数据中蕴藏的巨大价值,数据挖掘中很重要的一步是数据发布,数据发布能借助第三方技术和公众智慧,集思广益,充分挖掘数据价值,更好地提供战略决策。但数据发布要面对一个重要问题——隐私泄露和信息安全,这一问题也成为制约数据分析和挖掘技术进一步发展的瓶颈。为了在保证用户隐私不被泄露的前提下进行数据发布,通常的做法是将唯一标识个体的某个或某些属性用无意义的符号进行替换,但这种处理方式不能起到很好的保护效果,攻击者可以通过背景知识等其他掌握的信息识别用户身份,进而获取用户的敏感信息。学术界就此提出了很多技术和方法,其中,匿名化技术是一种经典的隐私保护方法。ERP信息系统因其内部数据具有真实性高且质量好的特点,而具有极高的数据发布与分析价值。本文以ERP信息系统为背景,主要研究该系统数据发布中的隐私保护方法,所做的主要工作和贡献如下:第一,,针对ERP信息中构建的攻击模型,提出了基于k—匿名的隐私保护方法。首先,分析了实验数据集SAP GBI 2.3,并考虑到ERP信息系统数据的普遍特点,在提出合适的数据结构和相关假设的基础上,构建基于销售订单的攻击模型,并引入实用的数据可用性度量;接着,针对提出的攻击模型开发了基于加权匹配的k-匿名算法,并通过与另外两种算法的对比,以数据可用性度量为标尺,验证了本算法的有效性和优越性。所做的重要贡献是针对ERP信息系统的数据特点提出了具有普遍适用性的基于销售订单的攻击模型,并在此基础上开发出一种高效的匿名算法。第二,针对一个具体领域的ERP信息系统——铁路ERP系统提出了一种新的数据结构和匿名化方法。由于铁路ERP系统数据的多样性,结合考虑用户的社交网络信息和地理位置信息,提出了基于超图的地理社交网络(GSN)模型,并在此基础上构建了攻击模型和匿名模型,还定义了若干种数据可用性度量,并在此基础上,开发出地理社交网络的(κ,m)—匿名算法和(κ,m,l)—匿名算法,通过大量实验,以数据可用性度量为标尺,对不同阶段的实验结果做出评估,验证了本算法的有效性。所做的重要贡献是根据铁路ERP系统数据的复杂特点,提出了基于超图的GSN模型,并提出实用的攻击模型和匿名模型,定义可靠的数据可用性度量,进而开发出一套较为完善的匿名算法。
[Abstract]:With the rapid development of information technology such as the Internet, there is a huge amount of information data in various fields. Nowadays, the ability of data collection, analysis and mining in various industries has been greatly improved, especially in data mining technology. A very important step in data mining is data release, which can utilize the third party technology and public wisdom, pool wisdom, fully mine the value of data, and provide better strategic decision. However, data release has to face an important problem-privacy disclosure and information security, which has become a bottleneck restricting the further development of data analysis and mining technology. In order to release the data without compromising the privacy of the user, the usual practice is to replace one or some attributes of the unique identity of the individual with meaningless symbols, but this approach does not have a good protection effect. The attacker can identify the user through other information, such as background knowledge, and then obtain the sensitive information of the user. The academic circles have put forward many techniques and methods. Among them, anonymous technology is a classical privacy protection method. ERP information system has high value of data release and analysis because of its high authenticity and good quality of internal data. Taking ERP information system as the background, this paper mainly studies the privacy protection method in the data release of the system. The main work and contributions are as follows: first, aiming at the attack model in the ERP information, A privacy protection method based on k-anonymity is proposed. Firstly, the experimental data set SAP GBI 2.3 is analyzed, and considering the general characteristics of ERP information system data, the attack model based on sales order is constructed on the basis of appropriate data structure and relevant assumptions. Then, a k- anonymous algorithm based on weighted matching is developed for the attack model, and compared with the other two algorithms, the data availability measurement is used as a scale. The validity and superiority of this algorithm are verified. The important contribution of this paper is to propose a universally applicable attack model based on sales order according to the data characteristics of ERP information system and to develop an efficient anonymous algorithm on the basis of this model. Secondly, a new data structure and anonymous method for a specific ERP information system-Railway ERP system is proposed. Because of the diversity of railway ERP system data, combined with the social network information and geographical location information of users, the (GSN) model of geographical social network based on hypergraph is proposed, and the attack model and anonymous model are constructed on the basis of this model. Several kinds of data usability measures are also defined. On the basis of this, the (魏 m) -anonymous algorithm and (魏 -m-1) -anonymous algorithm of geographical social network are developed. Through a large number of experiments, the data availability measurement is used as a scale. The experimental results at different stages are evaluated to verify the effectiveness of this algorithm. The important contribution is that according to the complex characteristics of railway ERP system data, a GSN model based on hypergraph is proposed, and a practical attack model and anonymous model are proposed to define reliable data availability measurement. Then a set of relatively perfect anonymous algorithm is developed.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309
本文编号:2217252
[Abstract]:With the rapid development of information technology such as the Internet, there is a huge amount of information data in various fields. Nowadays, the ability of data collection, analysis and mining in various industries has been greatly improved, especially in data mining technology. A very important step in data mining is data release, which can utilize the third party technology and public wisdom, pool wisdom, fully mine the value of data, and provide better strategic decision. However, data release has to face an important problem-privacy disclosure and information security, which has become a bottleneck restricting the further development of data analysis and mining technology. In order to release the data without compromising the privacy of the user, the usual practice is to replace one or some attributes of the unique identity of the individual with meaningless symbols, but this approach does not have a good protection effect. The attacker can identify the user through other information, such as background knowledge, and then obtain the sensitive information of the user. The academic circles have put forward many techniques and methods. Among them, anonymous technology is a classical privacy protection method. ERP information system has high value of data release and analysis because of its high authenticity and good quality of internal data. Taking ERP information system as the background, this paper mainly studies the privacy protection method in the data release of the system. The main work and contributions are as follows: first, aiming at the attack model in the ERP information, A privacy protection method based on k-anonymity is proposed. Firstly, the experimental data set SAP GBI 2.3 is analyzed, and considering the general characteristics of ERP information system data, the attack model based on sales order is constructed on the basis of appropriate data structure and relevant assumptions. Then, a k- anonymous algorithm based on weighted matching is developed for the attack model, and compared with the other two algorithms, the data availability measurement is used as a scale. The validity and superiority of this algorithm are verified. The important contribution of this paper is to propose a universally applicable attack model based on sales order according to the data characteristics of ERP information system and to develop an efficient anonymous algorithm on the basis of this model. Secondly, a new data structure and anonymous method for a specific ERP information system-Railway ERP system is proposed. Because of the diversity of railway ERP system data, combined with the social network information and geographical location information of users, the (GSN) model of geographical social network based on hypergraph is proposed, and the attack model and anonymous model are constructed on the basis of this model. Several kinds of data usability measures are also defined. On the basis of this, the (魏 m) -anonymous algorithm and (魏 -m-1) -anonymous algorithm of geographical social network are developed. Through a large number of experiments, the data availability measurement is used as a scale. The experimental results at different stages are evaluated to verify the effectiveness of this algorithm. The important contribution is that according to the complex characteristics of railway ERP system data, a GSN model based on hypergraph is proposed, and a practical attack model and anonymous model are proposed to define reliable data availability measurement. Then a set of relatively perfect anonymous algorithm is developed.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP309
【参考文献】
相关期刊论文 前4条
1 胡佳林;;ERP数据系统的分析与研究[J];东方汽轮机;2012年04期
2 朱姝;;浅析ERP系统中的数据仓库[J];电脑知识与技术;2012年05期
3 江华;;ERP系统在铁路货车制造中的实施与应用[J];铁道车辆;2011年07期
4 孙美丽;美国和欧盟的数据隐私保护策略[J];情报科学;2004年10期
相关硕士学位论文 前1条
1 陈成;国电S发电公司ERP信息系统应用研究[D];华北电力大学(北京);2008年
,本文编号:2217252
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2217252.html