面向SSD的重复数据删除机制设计与实现
发布时间:2018-08-13 18:04
【摘要】:大数据时代,带来了更多的磁盘读写操作。固态盘(Solid State Disk,SSD)读写性能强于传统硬盘驱动器(Hard Disk Drive,HDD),特别地,凭借其比HDD更加优异的随机读写特性,SSD能有效提升计算机系统的存储性能。但是SSD存在写入放大等性能问题,其有限的擦除次数也限制了SSD的使用寿命。如何在减少SSD写入次数同时保证其优异的读写性能成为一个亟需解决的问题。 重复数据删除技术可以减少冗余数据在磁盘中的存储。在分析SSD结构和读写特性的基础上,结合当前SSD的研究现状,提出了一种面向SSD的重复数据删除机制来解决以上问题。具体地,引入写缓存,可以有效的解决写入放大和SSD使用寿命问题;采用元数据和数据区分策略,在SSD中进行分离存储,元数据直接存入元数据存储区,应用数据经过重删存后存入数据存储区,同时提升元数据在写缓存中的驻留优先级;采用基于位图查询的物理地址分配策略,顺序分配物理地址,均衡SSD内部页的写操作;在地址转换中,增加了虚拟块地址(Virtual Block Address,VBA),减少数据迁移时地址转换表的操作。 设计并实现了一个面向SSD的重复数据删除原型系统,系统由功能模块组成,具有较强的可扩展性。各功能模块实现上述机制(写缓存、元数据和数据区分、基于位图查询的物理地址分配、虚拟块地址等)。在原型系统上进行了相应的功能和性能测试。测试结果显示,对重复数据的删除效率高达95%,机制引入写缓存和元数据区分后,性能提升了60%以上。
[Abstract]:Big data era, brought more disk read and write operation. The read and write performance of solid state disk (Solid State) is better than that of (Hard Disk drive (HDD). In particular, it can effectively improve the storage performance of computer system by virtue of its better random read and write characteristic than HDD. However, SSD has some performance problems, such as writing and amplifying, and its limited erasure number also limits the service life of SSD. How to reduce the number of SSD writes while ensuring its excellent reading and writing performance has become an urgent problem. Duplicate data deletion can reduce the storage of redundant data on disk. On the basis of analyzing the structure of SSD and the characteristics of reading and writing, combined with the current research status of SSD, a mechanism of repeated data deletion for SSD is proposed to solve the above problems. Specifically, the problem of write amplification and SSD lifetime can be effectively solved by introducing write cache, and metadata is stored separately in SSD, and metadata is stored directly in metadata storage area. The application data is stored in the data storage area after redelete, and the priority of metadata resident in the write cache is raised, the physical address allocation strategy based on bitmap query is adopted, the physical address is assigned sequentially, and the write operation of the SSD internal page is balanced. In address translation, virtual block address (Virtual Block address VBA) is added to reduce the operation of address translation table in data migration. A prototype system of repetitive data deletion for SSD is designed and implemented. The system is composed of functional modules and has strong extensibility. Each functional module implements the above mechanisms (write cache, metadata and data differentiation, physical address allocation based on bitmap query, virtual block address, etc.). The corresponding function and performance tests are carried out on the prototype system. The test results show that the efficiency of deleting duplicate data is as high as 95%, and the performance of the mechanism is improved by more than 60% after introducing write cache and metadata differentiation.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2181767
[Abstract]:Big data era, brought more disk read and write operation. The read and write performance of solid state disk (Solid State) is better than that of (Hard Disk drive (HDD). In particular, it can effectively improve the storage performance of computer system by virtue of its better random read and write characteristic than HDD. However, SSD has some performance problems, such as writing and amplifying, and its limited erasure number also limits the service life of SSD. How to reduce the number of SSD writes while ensuring its excellent reading and writing performance has become an urgent problem. Duplicate data deletion can reduce the storage of redundant data on disk. On the basis of analyzing the structure of SSD and the characteristics of reading and writing, combined with the current research status of SSD, a mechanism of repeated data deletion for SSD is proposed to solve the above problems. Specifically, the problem of write amplification and SSD lifetime can be effectively solved by introducing write cache, and metadata is stored separately in SSD, and metadata is stored directly in metadata storage area. The application data is stored in the data storage area after redelete, and the priority of metadata resident in the write cache is raised, the physical address allocation strategy based on bitmap query is adopted, the physical address is assigned sequentially, and the write operation of the SSD internal page is balanced. In address translation, virtual block address (Virtual Block address VBA) is added to reduce the operation of address translation table in data migration. A prototype system of repetitive data deletion for SSD is designed and implemented. The system is composed of functional modules and has strong extensibility. Each functional module implements the above mechanisms (write cache, metadata and data differentiation, physical address allocation based on bitmap query, virtual block address, etc.). The corresponding function and performance tests are carried out on the prototype system. The test results show that the efficiency of deleting duplicate data is as high as 95%, and the performance of the mechanism is improved by more than 60% after introducing write cache and metadata differentiation.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 史高峰;李小勇;;SSD数据结构与算法综述[J];微型电脑应用;2012年04期
,本文编号:2181767
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2181767.html