面向SaaS的基于属性聚类与竞价机制的隐私保护方法研究
发布时间:2018-03-23 03:32
本文选题:Saas 切入点:隐私保护 出处:《山东大学》2015年硕士论文 论文类型:学位论文
【摘要】:软件即服务,Software-as-a-Service,伴随着云计算的发展而逐渐普及起来。企业等租户利用SaaS提供的服务处理业务,省去了软件安装与维护的人力、财力。同时,多租户根据资源需求和租赁时间缴纳费用,减少了因为处理少量数据而专门购买大型软件所造成的资源和资金的浪费。然而,SaaS应用的软件安装在本地服务器上,租户若要使用SaaS服务,必须把自己的数据存放在SaaS服务提供商的服务器上。如果对这些数据不加以处理,则公司机密信息与个人的隐私数据就面临着泄漏的可能,数据安全问题受到威胁。因此如何有效的解决数据隐私问题成为当下各研究机构、专家学者的关注。现有的隐私保护方法主要分为两种,数据加密与数据混淆。数据加密是基于数学上难解的问题或不可逆的过程为算法对数据进行变形,使得数据即使泄漏攻击者也难以理解数据本身的意义。数据混淆是通过泛化或者匿名的方式隐藏原有数据信息。其中泛化是将离散的数据值扩展为一段连续的数据区间,该数据区间包含原数据值,匿名是通过划分或者掺沙的方式保证每个分组内的数据等价,以此来隐藏数据分布的信息。然而为了提高隐私保护程度,加密算法一般设计的较为复杂,进而加密解密需要的计算时间较长,这对于即时的SaaS应用来说不可接受。数据混淆较数据加密的计算速度要快,但是存在原有数据无法重构的缺陷,并且会产生脏数据。针对上述问题和挑战,本文提出数据划分的概念。数据划分是指将租户身份信息(如姓名、身份证号、社保号)与隐私数据(如疾病、薪资)垂直分割到不同数据分块中,混淆之间的对应关系。该方法相对于数据加密效率高,同时可以根据数据分块间的对应关系重构租户的原始数据逻辑结构,避免了租户数据失真的问题。数据划分粒度越细,数据安全程度越高,然而组合数据花费的时间也越多。如何对数据进行合理的划分使得用户的隐私得到保障的同时尽可能地提高应用的响应速度成为一种挑战。本文通过统计用户对数据的访问模式生成属性关联度矩阵,使用键能算法对属性关联度矩阵进行聚类,以用户个性化提出的隐私约束作为限制条件对聚类后的矩阵进行分割,从而生成最优隐私划分策略,该策略下得到的数据分块上应用操作所需的连接次数最少,性能最好。相同的服务资源下,隐私保护程度的提高以牺牲一定的计算速度为前提,应用服务的计算速度与内存、CPU等计算资源量成正相关关系。在计算资源一定的情况下,如何分配给不同用户使得各用户在满足自身的隐私保护需求的前提下,最大化资源利用率。本文通过独立定价算法跟集中定价算法两阶段竞价机制,全局调控用户的资源申请量。独立定价参考同需求(或需求相近)的历史定价数据,根据资源占比情况快速给出价格;集中定价策略根据多租户提出的资源申请量与隐私保护服务需求建立效用函数,并使用多目标粒子群算法根据最优解的帕累托支配关系求解出最佳资源分配与定价。
[Abstract]:Software as a service, Software-as-a-Service, with the development of cloud computing and the increasing popularity of services for business enterprises. The tenant provided by SaaS, eliminating the need for software installation and maintenance of the human resources. At the same time, according to the multi tenant resource requirements and pay a fee for the lease time, reduced because of a small amount of data processing and specializes in buying large software caused by the waste of money and resources. However, the application of SaaS software installed on the local server, the tenants to use the SaaS service, we must put our own data stored in the server SaaS service provider. If the data is not addressed, the privacy data is confidential information and personal face leakage may, the problem of data security is threatened. So how to effectively solve the problem of data privacy has become the current research institutions, experts and scholars pay attention to existing privacy. The main protection methods are divided into two types, mixed data encryption and data. Data encryption is a mathematical puzzle or irreversible process is based on the algorithm of data distortion, even if the attacker makes data leakage is also difficult to understand the meaning of the data itself. Data obfuscation is hidden information of original data through universal or anonymous way. The generalization is discrete data values for a continuous expansion of data interval, the interval data contains the original data values, anonymous data guarantee equivalence in each group by dividing or sand mixing way, in order to hide the information of the data distribution. However, in order to improve the degree of privacy protection, the general design of the complicated encryption algorithm then, the encryption and decryption of longer computation time required for the SaaS application, the real-time data obfuscation is not acceptable. The calculation speed faster than the data encryption, but is There are shortcomings of the original data can not be reconstructed, and will produce the dirty data. Aiming at the above problems and challenges, this paper puts forward the concept of data partitioning. Data division refers to the tenant identity information (such as name, ID number, social security number) and private data (such as disease, salary) vertically divided into different data blocks the corresponding relationship between the confusion, the data encryption method. Compared with high efficiency, and can be based on the original data points corresponding to reconstruct the relationship between the logical structure of the data block between the tenants and tenants to avoid the distortion of data. Data granularity is finer, the higher the degree of data security, however, the combined data time spent on how to more. The data were reasonably divided so that the user's privacy is guaranteed as far as possible to improve the response speed of the application has become a challenge. Through the statistics of user access patterns of data is generated Of the correlation matrix, can use the key algorithm to cluster the attribute correlation matrix, put forward to user privacy constraints as matrix after clustering limit conditions to generate the optimal segmentation, privacy partition strategy, the strategy of data block on the application of operating connection required minimum number of the best performance. The same service resources, improve the degree of privacy protection at the expense of calculation speed as the premise, computing speed and memory applications, CPU is a positive correlation between the amount of computing resources in the computational resources. Under certain circumstances, how to make different users assigned to each user in the premise of meeting the demand of their own under the protection of privacy and maximize the utilization of resources. Through the independent pricing algorithm with centralized pricing algorithm two stage bidding mechanism, control user resource application independent pricing reference with demand (. Or similar) historical pricing data, according to the proportion of resources given the rapid price; pricing strategy is established according to the utility function of multi tenant application resources and the protection of privacy and service needs, using multi-objective particle swarm optimization algorithm based on the optimal solution of the Pareto relation for the optimal resource allocation and pricing.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP309
【参考文献】
相关期刊论文 前1条
1 胡志刚;刘艳;;云环境下基于组合双向拍卖的动态资源定价[J];计算机工程;2012年08期
,本文编号:1651724
本文链接:https://www.wllwen.com/shekelunwen/shehuibaozhanglunwen/1651724.html