云网关上重复数据删除的设计与实现
发布时间:2018-08-16 15:15
【摘要】:随着信息时代的发展,海量数据的存储传输成为目前必须解决的一个重要问题。云存储为海量数据的存储提供了很好的解决方案,但是缺乏标准化的云存储应用程序编程接口,极大的限制了云存储的应用。 缺乏标准化的云存储应用程序编程接口使云网关成为云存储必不可少的重要组成部分,云网关已经成为应用和云提供商应用程序编程接口之间协议的翻译,但是由于云网关很少支持企业级的其它服务,限制了它的应用范围,目前大部分被用于归档和备份。云网关的问题在于它增加了云存储的复杂性及限制了云存储的性能,所以从性能及简洁性角度来看,它并不适合作为理想的主要应用。 针对备份归档文件中的高冗余问题,云网关上设计实现重复数据删除功能,能为云网关与云平台间的通信减少网络带宽同时减小数据在云平台上的存储容量。云网关上的重复数据删除功能的设计与实现主要包括在云网关上对文件分块,将分块文件计算数据指纹,,将数据指纹与哈希表中指纹进行比对,丢弃重复数据,将文件信息和非重复的数据分离独立保存在Swift云平台上。 通过与不进行重复数据删除的云网关性能对比表明,在云网关上进行重复数据删除对于高冗余数据能够减少76%~91%的重复数据存储容量,节省了70%~86%的网络带宽,提高了云网关的性能,且对云网关的响应开销影响较小。
[Abstract]:With the development of the information age, the storage and transmission of massive data has become an important problem that must be solved. Cloud storage provides a good solution for mass data storage, but the lack of standardized cloud storage application programming interface greatly limits the application of cloud storage. The lack of standardized cloud storage application programming interface makes cloud gateway an essential part of cloud storage, and cloud gateway has become the translation of protocol between application and cloud provider application programming interface. However, because cloud gateways rarely support other enterprise-level services, which limits its application, most of them are currently used for archiving and backup. The problem of cloud gateway is that it increases the complexity of cloud storage and limits the performance of cloud storage, so it is not suitable for the ideal main application from the point of view of performance and simplicity. Aiming at the problem of high redundancy in backup archival files, the repeated data deletion function is designed and implemented on cloud gateway, which can reduce the network bandwidth and the storage capacity of data on cloud platform for the communication between cloud gateway and cloud platform. The design and implementation of duplicate data deletion function on cloud gateway mainly include dividing files into blocks, calculating data fingerprints, comparing data fingerprints with fingerprints in hash table, and discarding duplicate data. Separate file information from non-duplicate data and store it independently on the Swift cloud platform. Compared with the cloud gateway without repeated data deletion, it is shown that repeated data deletion on cloud gateway can reduce the storage capacity of 76 / 91% of repetitive data and save 70 / 86% of network bandwidth for high redundant data. The performance of the cloud gateway is improved, and the response overhead of the cloud gateway is less affected.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2186369
[Abstract]:With the development of the information age, the storage and transmission of massive data has become an important problem that must be solved. Cloud storage provides a good solution for mass data storage, but the lack of standardized cloud storage application programming interface greatly limits the application of cloud storage. The lack of standardized cloud storage application programming interface makes cloud gateway an essential part of cloud storage, and cloud gateway has become the translation of protocol between application and cloud provider application programming interface. However, because cloud gateways rarely support other enterprise-level services, which limits its application, most of them are currently used for archiving and backup. The problem of cloud gateway is that it increases the complexity of cloud storage and limits the performance of cloud storage, so it is not suitable for the ideal main application from the point of view of performance and simplicity. Aiming at the problem of high redundancy in backup archival files, the repeated data deletion function is designed and implemented on cloud gateway, which can reduce the network bandwidth and the storage capacity of data on cloud platform for the communication between cloud gateway and cloud platform. The design and implementation of duplicate data deletion function on cloud gateway mainly include dividing files into blocks, calculating data fingerprints, comparing data fingerprints with fingerprints in hash table, and discarding duplicate data. Separate file information from non-duplicate data and store it independently on the Swift cloud platform. Compared with the cloud gateway without repeated data deletion, it is shown that repeated data deletion on cloud gateway can reduce the storage capacity of 76 / 91% of repetitive data and save 70 / 86% of network bandwidth for high redundant data. The performance of the cloud gateway is improved, and the response overhead of the cloud gateway is less affected.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关博士学位论文 前4条
1 杨天明;网络备份中重复数据删除技术研究[D];华中科技大学;2010年
2 黄莉;基于语义关联的重复数据清理技术研究[D];华中科技大学;2011年
3 谭玉娟;数据备份系统中数据去重技术研究[D];华中科技大学;2012年
4 王灿;基于在线重复数据消除的海量数据处理关键技术研究[D];电子科技大学;2012年
本文编号:2186369
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2186369.html