一种基于综合频度指标的数据记录分级存储方法及研究实现
发布时间:2018-09-02 10:11
【摘要】:随着社会信息化进程的日益加速,现有通信、金融等领域内的业务数据也在不间断的更新完善,数据量呈爆炸式增长,计算机存储系统的数据已达到PB量级,这些数据以记录或者文件的形式存储在企业的存储系统中,随着时间的推移,数据的利用价值也在变化,必然在存储系统内部会产生大量的不常用的数据,这些数据占据了存储系统的大量有效存储空间。 面对日益增长的数据量,大多数企业想到的是在原有存储系统的基础上扩充容量,但是在容量扩充到一定程度后很难继续在原有存储系统上增加容量,等到原有存储系统的容量不能在容纳更多的数据,企业不得不花费高成本新增加一套跟现有存储系统同样级别的存储系统,从表面上看增加了一套新的存储系统解决了存储容量扩充的问题,但是,在大数据量的存储系统磁盘I/0成为数据检索速度的主要瓶颈,这样做不但不能提高检索的速度反而在整体上降低了存储系统的性能;这样就造成企业在数据处理时效率低、企业客户访问速度慢等一系列的问题,如何解决规模庞大的记录形式的数据存储,提高存储系统中所有存储资源的利用率,解决传统存储在I/0和存储容量的瓶颈问题,分级存储有着重要的意义。 该论文首先介绍了分级存储结构及其必要性和优势所在,分级存储的国内外研究现状和在实际应用中存在的问题,接着综述与分级存储理论息息相关的信息生命周期理论。分析了现有四种基本的数据存储结构及各自的特点,详细概述了目前典型的三级存储架构,以及每级存储架构的特点。 该论文对基本数据存储结构及分级存储架构分析后,接着分析了先进先出、最优置换算法、最近最久未使用、第二次机会等常见的数据置换算法基本思想,在分析各算法的各自特点后对比个算法的优劣之处,作为现行典型数据分级算法的原型,在此基础上引出对几种典型数据迁移策略的分析,分析了基于存储空间的高低水位法、基于访问频度的CACHE置换算法和基于生命周期的价值评估法,分析对比三种算法的优劣。通过分析通信、金融两种典型行业应用系统数据库存储特点,指出两种行业中存在的若干问题,接着分析了传统分级存储思想下的存储级别划分,最后给出银行业务系统存储级别划分方案及记录形式数据分级存储系统设计。 该论文提出一种基于综合频度指标的数据分级存储方法,该算法综合考虑了存储设备的存储空间、存储设备性能、数据的相关性等因素,试图通过平衡各个影响因素对数据访问的影响,以达到数据最理想的访问效果,算法在Windows平台下模拟试验证明,与两种典型算法相比,该算法在实现数据分级以及平衡存储系统容量和访问速度方面更高效。
[Abstract]:With the increasing acceleration of the social information process, the existing business data in the fields of communication, finance and other fields are constantly updated and improved, the amount of data increases explosively, and the data of the computer storage system has reached the order of PB. These data are stored in the enterprise storage system in the form of records or files. As time goes on, the utilization value of the data is also changing, which will inevitably produce a large number of unusual data within the storage system. This data occupies a large amount of efficient storage space in the storage system. Faced with the increasing amount of data, most enterprises are thinking of expanding the capacity on the basis of the original storage system, but it is difficult to continue to increase the capacity on the original storage system after the capacity has been expanded to a certain extent. When the capacity of the original storage system cannot hold more data, the enterprise has to spend high cost to add a new storage system of the same level as the existing storage system. On the surface, a new storage system has been added to solve the problem of storage capacity expansion. However, the disk I / 0, a large storage system with large amount of data, has become the main bottleneck of data retrieval speed. This not only can not improve the speed of retrieval, but also reduces the performance of the storage system as a whole, which results in a series of problems such as low efficiency in data processing, slow access speed of enterprise customers, and so on. It is of great significance to solve the problem of data storage in large scale record form, to improve the utilization of all storage resources in storage system, and to solve the bottleneck problem of traditional storage in I / 0 and storage capacity. This paper first introduces the hierarchical storage structure, its necessity and advantages, the current situation of hierarchical storage research at home and abroad and the problems in practical application, and then summarizes the information life cycle theory, which is closely related to hierarchical storage theory. In this paper, four basic data storage structures and their respective characteristics are analyzed, and the typical three-level storage architecture and the characteristics of each storage architecture are summarized in detail. After analyzing the basic data storage structure and hierarchical storage architecture, this paper analyzes the basic ideas of the data permutation algorithms, such as first-in-first-out, optimal permutation algorithms, the most recently unused data permutation algorithms, the second opportunity, and so on. After analyzing the characteristics of each algorithm, the advantages and disadvantages of each algorithm are compared. As the prototype of the current typical data classification algorithm, the analysis of several typical data migration strategies is introduced, and the high and low water level method based on storage space is analyzed. The CACHE permutation algorithm based on access frequency and the value evaluation method based on life cycle are analyzed and compared. By analyzing the characteristics of database storage in two typical industries such as communication and finance, this paper points out some problems existing in the two industries, and then analyzes the classification of storage levels under the traditional hierarchical storage idea. In the end, the paper gives the classification scheme of bank business system storage level and the design of record form data hierarchical storage system. In this paper, a data hierarchical storage method based on comprehensive frequency index is proposed. The algorithm considers the storage space of storage device, the performance of storage device, the correlation of data and so on. This paper tries to balance the influence of various factors on data access, in order to achieve the optimal data access effect. The simulation results of the algorithm under Windows platform show that the algorithm is compared with two typical algorithms. The algorithm is more efficient in implementing data classification and balancing storage system capacity and access speed.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2219043
[Abstract]:With the increasing acceleration of the social information process, the existing business data in the fields of communication, finance and other fields are constantly updated and improved, the amount of data increases explosively, and the data of the computer storage system has reached the order of PB. These data are stored in the enterprise storage system in the form of records or files. As time goes on, the utilization value of the data is also changing, which will inevitably produce a large number of unusual data within the storage system. This data occupies a large amount of efficient storage space in the storage system. Faced with the increasing amount of data, most enterprises are thinking of expanding the capacity on the basis of the original storage system, but it is difficult to continue to increase the capacity on the original storage system after the capacity has been expanded to a certain extent. When the capacity of the original storage system cannot hold more data, the enterprise has to spend high cost to add a new storage system of the same level as the existing storage system. On the surface, a new storage system has been added to solve the problem of storage capacity expansion. However, the disk I / 0, a large storage system with large amount of data, has become the main bottleneck of data retrieval speed. This not only can not improve the speed of retrieval, but also reduces the performance of the storage system as a whole, which results in a series of problems such as low efficiency in data processing, slow access speed of enterprise customers, and so on. It is of great significance to solve the problem of data storage in large scale record form, to improve the utilization of all storage resources in storage system, and to solve the bottleneck problem of traditional storage in I / 0 and storage capacity. This paper first introduces the hierarchical storage structure, its necessity and advantages, the current situation of hierarchical storage research at home and abroad and the problems in practical application, and then summarizes the information life cycle theory, which is closely related to hierarchical storage theory. In this paper, four basic data storage structures and their respective characteristics are analyzed, and the typical three-level storage architecture and the characteristics of each storage architecture are summarized in detail. After analyzing the basic data storage structure and hierarchical storage architecture, this paper analyzes the basic ideas of the data permutation algorithms, such as first-in-first-out, optimal permutation algorithms, the most recently unused data permutation algorithms, the second opportunity, and so on. After analyzing the characteristics of each algorithm, the advantages and disadvantages of each algorithm are compared. As the prototype of the current typical data classification algorithm, the analysis of several typical data migration strategies is introduced, and the high and low water level method based on storage space is analyzed. The CACHE permutation algorithm based on access frequency and the value evaluation method based on life cycle are analyzed and compared. By analyzing the characteristics of database storage in two typical industries such as communication and finance, this paper points out some problems existing in the two industries, and then analyzes the classification of storage levels under the traditional hierarchical storage idea. In the end, the paper gives the classification scheme of bank business system storage level and the design of record form data hierarchical storage system. In this paper, a data hierarchical storage method based on comprehensive frequency index is proposed. The algorithm considers the storage space of storage device, the performance of storage device, the correlation of data and so on. This paper tries to balance the influence of various factors on data access, in order to achieve the optimal data access effect. The simulation results of the algorithm under Windows platform show that the algorithm is compared with two typical algorithms. The algorithm is more efficient in implementing data classification and balancing storage system capacity and access speed.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 白广思;;信息资源分级存储实现信息生命周期管理[J];情报杂志;2007年01期
,本文编号:2219043
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2219043.html