基于文件的数据分级存储的研究与实现

发布时间：2018-05-11 09:52

本文选题：基于文件的数据 + 分级存储管理　；参考：《昆明理工大学》2013年硕士论文

【摘要】：数据一般分为两类,一类是可以利用二维表结构表示,使用数据库管理系统进行存储管理记录级数据；一类则是直接由文件系统管理的基于文件的数据。通常在数据处理过程中,对基于文件的数据的处理远远超过对记录级数据的操作,需要存储的基于文件的数据也远远多于记录级数据。这对存储系统的I/O性能和存储设备容量提出了更高的要求。同时由于数据并非生来就是平等的,伴随着生命周期的流逝,数据实际具有的应用价值也不断降低,存储价值也不断变化。由于存储设备I/O吞吐率、容量和价格的不同,若不能合理的分配存储设备会造成存储资源的浪费,增加存储的投入成本。如何解决规模庞大的基于文件的数据存储,提高存储系统中所有存储资源的利用率,解决传统存储在I/O和存储容量的瓶颈问题,分级存储有着重要的意义。在现有的分级存储系统中,大部分是针对固定的应用场景,而数据的重要性是动态变化的,在不同的应用中具有的实际存储价值也是不同。采用分级存储意味着数据必会在各级存储设备间动态迁移,合理的数据迁移需要遵循一定的迁移策略,才能实现分级存储的价值,对于现有的迁移策略,大都存在着考虑数据属性不全面的情况。针对目前分级存储存在的问题,本文在详细分析了分级存储的研究现状和存在的问题的基础了,设计了基于文件的数据分级存储模型。主要内容如下：1.设计了基于文件的数据访问特性记录表的数据结构,用来记录数据的访问情况。对于需要额外统计的访问频度给出了统计方法。系统每次获得基于文件的数据的访问信息时,不在需要扫描整个存储系统,只需增量扫描记录表,便可获得访问信息。2.提出了迁移对象选择策略,主要是根据文件访问特性记录表获得访问情况,计算数据的实际使用价值,确定哪些数据是需要被迁移的。3.提出了基于访问频度为主要指标,存储设备剩下空间和I/O热度为辅助指标的多指标数据迁移策略实现数据的迁移,以及如何在众多策略中自适应选择合适的策略。4、在数据迁移过程中,需要合理地控制数据迁移进程。在有外部中断请求时,应采取相应的中断处理过程,以保障迁移进程的继续进行。实验结果一方面显示采用分级存储能够有效的解决存储设备容量瓶颈问题,使得存储设备剩余空间在可控的范围内,保证占用空间分布平衡,充分提高了存储资源的利用率。另一方面显示采用数据迁移控制解决应用程序访问中断可以提高系统的吞吐率。
[Abstract]:Data can be divided into two categories, one is that it can be represented by two-dimensional table structure, and the database management system is used to store and manage record level data, and the other is file based data which is directly managed by file system. In the process of data processing, the processing of file-based data is far more than the operation of record-level data, and the need to store file-based data is far greater than that of record-level data. This puts forward higher requirements for I / O performance and storage device capacity of storage system. At the same time, because data is not born equal, with the passage of life cycle, the practical application value of data is also decreasing, and the storage value is also changing. Due to the difference of I / O throughput, capacity and price of storage devices, if storage devices can not be allocated properly, it will cause waste of storage resources and increase the cost of storage. How to solve the huge file based data storage, improve the utilization of all storage resources in the storage system, and solve the bottleneck problem of traditional storage in I / O and storage capacity, hierarchical storage has important significance. In the existing hierarchical storage systems, most of them are aimed at fixed application scenarios, and the importance of data is dynamic, and the actual storage value in different applications is also different. The adoption of hierarchical storage means that the data must be dynamically migrated between all levels of storage devices. Reasonable data migration needs to follow certain migration strategies in order to realize the value of hierarchical storage. Most of them have incomplete consideration of data attributes. Aiming at the existing problems of hierarchical storage, this paper analyzes the research status and the existing problems of hierarchical storage in detail, and designs a file based data hierarchical storage model. The main content is as follows: 1. The data structure of the data access characteristic record table based on file is designed to record the data access. A statistical method is given for the access frequency which requires additional statistics. Every time the system acquires the access information of file-based data, it does not need to scan the entire storage system, but only incrementally scans the record table to obtain access information. A migration object selection strategy is proposed, which is mainly based on the file access characteristic record table to obtain access, calculate the practical use value of the data, and determine which data needs to be migrated. 3. This paper proposes a multi-index data migration strategy based on the access frequency as the main index, the storage device remaining space and the I / O heat as the auxiliary index to realize the data migration. And how to adaptively select the appropriate strategy. 4 in the process of data migration, it is necessary to control the process of data migration reasonably. When there is an external interrupt request, the corresponding interrupt processing process should be adopted to ensure the continuation of the migration process. On the one hand, the experimental results show that hierarchical storage can effectively solve the bottleneck problem of storage device capacity, make the remaining space of storage device within a controllable range, ensure a balanced distribution of occupied space, and fully improve the utilization of storage resources. On the other hand, data migration control can improve the throughput of the system.
【学位授予单位】：昆明理工大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【参考文献】