重复数据删除技术中的并行性能优化算法研究
发布时间:2019-05-28 15:11
【摘要】:互联网技术的不断发展以及新兴网络应用的出现,各种重要数据正在以PB级(千万亿字节)的规模逐年增长。重复数据删除技术通过对存储数据流中冗余数据的定位与消除实现了存储资源的高效利用,使其在网络存储和备份系统中发挥着在越来越重要的作用。作为一种计算密集型和I/O密集型的应用,当存储数据量不断增长时,重复数据删除系统的性能主要受到数据块哈希计算和磁盘索引查重两方面的影响。近年来,随着多核(multi-core)和众核(many-core)处理器的普及,如何在并行环境中实现高效的数据块哈希计算和磁盘索引查重,已经成为重复数据删除性能优化研究中的热点问题。 目前,重复数据删除技术并行加速方法主要围绕着基于GPU的并行计算和基于多线程的并行磁盘索引查重这两种思想展开。然而,随着并行规模的增加,现有算法存在着一定的性能提升瓶颈,严重影响了当前算法的并行可扩展性。研究通过对GPU并行计算和并行查重算法性能评价模型的建立与分析,导出了并行规模扩大下影响系统性能提升的主要因素。针对两种方法中存在的性能瓶颈,提出了相关的优化算法并通过实际数据的测试证明了算法先进性。 在并行系统的性能优化研究中,需要对系统中主要处理环节对整体性能的影响进行详细的量化分析从而找出系统的性能瓶颈。针对GPU并行计算和并行索引表查重两种主要的并行算法,结合其具体处理流程以及其中存在的并发、资源共亨、竞争等因素,研究提出了两种基于随机Petri网的性能分析模型。通过对模型中各处理环节的处理速率和系统利用率的推导计算,得出了影响系统性能主要因素,为并行算法的优化研究提供了理论基础。同时,在后续的优化方法的研究中,通过对实际数据的测试结果分析,分别证明了性能模型推导结果的正确性。 在GPU加速的并行分块和指纹计算的处理方法中,主存储器和GPU全局存储器之间的数据传输延时已经成为影响整体计算性能的瓶颈。在传统GPU加速算法中,不同计算环节中对相同数据的重复传输会造成多余的数据传输开销。为此,提出了一种优化的GPU加速算法,该方法通过对传统方法中数据处理步骤的优化处理,能够有效地减少相同数据在主存和GPU之间的重复传输,从而有效地减轻数据传输时延对系统整体性能的影响。 由于数据的唯一性的需求,为了防止并行查重线程间的数据冲突,在并行磁盘索引查重方法中需要设计一种线程同步机制。随着并行规模的扩大,现有并行查重方法中采用的锁机制带来了巨大的一致性开销,严重影响了并行查重算法的整体性能。为减少并行查询的一致性开销,同时减少磁盘索引的查询延迟,结合分布式索引表的特点,提出了一种基于数据指纹后缀的并行索引查询优化方法。该方法根据数据指纹的哈希后缀将不同的数据指纹的查询任务加载于不同的查询线程以减少并行线程间的一致性开销。实验表明,该方法能够有效地减少系统I/O延时,相对于传统方法而言,能够有效地提高系统整体性能。
[Abstract]:With the development of Internet technology and the emergence of new network applications, all kinds of important data are increasing year by year with the scale of petabytes (millions of bytes). Deduplication technology has realized the efficient utilization of the storage resource by locating and eliminating the redundant data in the storage data stream, making it play an increasingly important role in the network storage and backup system. As a compute-intensive and I/ O-intensive application, the performance of the deduplication system is mainly affected by the data block hash calculation and the disk index check when the amount of storage data is increasing. In recent years, with the popularization of multi-core and many-core processors, how to realize the efficient data block hash calculation and the disk index check in the parallel environment has become a hot issue in the research of repeated data deletion performance optimization. At present, the parallel acceleration method of the repeated data deletion technology is mainly about the parallel computation based on the GPU and the parallel disk index based on the multi-thread. On the other hand, with the increase of the parallel scale, the existing algorithm has a certain performance improvement bottleneck, which seriously affects the parallel scalability of the current algorithm. In this paper, through the establishment and analysis of the performance evaluation model of the GPU parallel computing and parallel checking algorithm, the main cause of the performance improvement of the system under the scale of the parallel scale is derived. In order to solve the performance bottleneck in two methods, the related optimization algorithm is put forward and the algorithm is advanced through the test of the actual data. In the performance optimization of the parallel system, the influence of the main processing links on the overall performance of the system needs to be quantified and analyzed in detail, so as to find out the system's performance This paper presents two main parallel algorithms for GPU parallel computation and parallel index table, combining with the specific processing flow and the factors such as concurrency, resource co-existence and competition, and puts forward two performance points based on the stochastic Petri net. Based on the calculation of the processing rate and the system utilization rate of each processing link in the model, the main factors that affect the performance of the system are obtained, and the optimization research of the parallel algorithm is provided. On the basis of the research of the following optimization methods, the result of the performance model is proved through the analysis of the test results of the actual data. Correctness. Data transmission delay between main memory and GPU global memory has become the whole calculation in the processing method of GPU acceleration parallel block and fingerprint calculation The bottleneck of performance. In the traditional GPU acceleration algorithm, the repeated transmission of the same data in different computing links can cause redundancy according to the transmission overhead, an optimized GPU acceleration algorithm is proposed, which can effectively reduce the repeated transmission of the same data between the main memory and the GPU by optimizing the data processing steps in the conventional method, thereby effectively reducing the data transmission time delay and the whole system as a whole, The effect of the performance is affected by the uniqueness of the data. In order to prevent the data collision among the parallel check-in threads, it is necessary to design one in the parallel disk index checking method. With the expansion of the parallel scale, the lock mechanism adopted in the existing parallel-checking method brings huge consistency overhead, which seriously affects the parallel checking. In order to reduce the consistency of the parallel query, the query delay of the disk index is reduced, and the parallel cable based on the data fingerprint suffix is proposed in combination with the characteristics of the distributed index table. The method comprises the following steps of: loading the query tasks of different data fingerprints on different query threads according to the hash suffix of the data fingerprint so as to reduce the parallel thread; The experiment shows that the method can effectively reduce the I/ O delay of the system.
【学位授予单位】:华中科技大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2487159
[Abstract]:With the development of Internet technology and the emergence of new network applications, all kinds of important data are increasing year by year with the scale of petabytes (millions of bytes). Deduplication technology has realized the efficient utilization of the storage resource by locating and eliminating the redundant data in the storage data stream, making it play an increasingly important role in the network storage and backup system. As a compute-intensive and I/ O-intensive application, the performance of the deduplication system is mainly affected by the data block hash calculation and the disk index check when the amount of storage data is increasing. In recent years, with the popularization of multi-core and many-core processors, how to realize the efficient data block hash calculation and the disk index check in the parallel environment has become a hot issue in the research of repeated data deletion performance optimization. At present, the parallel acceleration method of the repeated data deletion technology is mainly about the parallel computation based on the GPU and the parallel disk index based on the multi-thread. On the other hand, with the increase of the parallel scale, the existing algorithm has a certain performance improvement bottleneck, which seriously affects the parallel scalability of the current algorithm. In this paper, through the establishment and analysis of the performance evaluation model of the GPU parallel computing and parallel checking algorithm, the main cause of the performance improvement of the system under the scale of the parallel scale is derived. In order to solve the performance bottleneck in two methods, the related optimization algorithm is put forward and the algorithm is advanced through the test of the actual data. In the performance optimization of the parallel system, the influence of the main processing links on the overall performance of the system needs to be quantified and analyzed in detail, so as to find out the system's performance This paper presents two main parallel algorithms for GPU parallel computation and parallel index table, combining with the specific processing flow and the factors such as concurrency, resource co-existence and competition, and puts forward two performance points based on the stochastic Petri net. Based on the calculation of the processing rate and the system utilization rate of each processing link in the model, the main factors that affect the performance of the system are obtained, and the optimization research of the parallel algorithm is provided. On the basis of the research of the following optimization methods, the result of the performance model is proved through the analysis of the test results of the actual data. Correctness. Data transmission delay between main memory and GPU global memory has become the whole calculation in the processing method of GPU acceleration parallel block and fingerprint calculation The bottleneck of performance. In the traditional GPU acceleration algorithm, the repeated transmission of the same data in different computing links can cause redundancy according to the transmission overhead, an optimized GPU acceleration algorithm is proposed, which can effectively reduce the repeated transmission of the same data between the main memory and the GPU by optimizing the data processing steps in the conventional method, thereby effectively reducing the data transmission time delay and the whole system as a whole, The effect of the performance is affected by the uniqueness of the data. In order to prevent the data collision among the parallel check-in threads, it is necessary to design one in the parallel disk index checking method. With the expansion of the parallel scale, the lock mechanism adopted in the existing parallel-checking method brings huge consistency overhead, which seriously affects the parallel checking. In order to reduce the consistency of the parallel query, the query delay of the disk index is reduced, and the parallel cable based on the data fingerprint suffix is proposed in combination with the characteristics of the distributed index table. The method comprises the following steps of: loading the query tasks of different data fingerprints on different query threads according to the hash suffix of the data fingerprint so as to reduce the parallel thread; The experiment shows that the method can effectively reduce the I/ O delay of the system.
【学位授予单位】:华中科技大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前1条
1 林闯,李雅娟,王忠民;性能评价形式化方法的现状和发展[J];电子学报;2002年S1期
相关博士学位论文 前1条
1 杨鹏;基于广义随机Petri网理论的SIP的研究[D];兰州理工大学;2009年
,本文编号:2487159
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2487159.html