CPU和GPU协同运算下的DEFLATE算法性能加速研究

发布时间：2018-04-07 16:29

本文选题：GPU　切入点：OPENCL　出处：《吉林大学》2013年硕士论文

【摘要】：异构计算，是未来高性能计算领域发展的主要趋势。顺应这种技术的发展趋势，GPU通用计算技术正面临前所未有的发展机遇。图形处理器在硬件设计方面具有：浮点运算能力强大，适合大规模、高计算密度并行数据处理等特点。GPU通用计算技术就是利用图形处理器的这些特点去完成那些非图形处理领域的通用计算任务。随着GPU编程技术的不断发展，利用GPU去协同CPU构建计算性能强大且成本较低的高性能计算平台具有广泛的应用前景。信息时代带来了“信息爆炸”。随着互联网的兴盛以及移动通信的发展，海量信息的存储及通信等问题日益突显。数据压缩的作用及其社会效益、经济效益将越来越明显。如果不进行数据压缩，那么无论是数据存储还是数据传输都很难实用化。使用数据压缩的好处在于：数据压缩不仅仅是能够为用户节约存储空间，也能较快的传输各种信息，减小通信延迟。此外，在节省通信带宽和节约信息传送资源消耗方面，，数据压缩也能起到很大的作用。当前数据压缩领域流行的无损压缩算法有很多，如DEFLATE、BZIP2、LZMA、LZMA2等等，其中压缩速度最快的是DEFLATE压缩算法。根据对DEFLATE算法性能的实际测试，发现DEFLATE算法在大数据文件压缩方面的性能表现，并不如预期那样令人满意。因此，本文将以对DEFLATE算法进行性能改进作为实际出发点，研究如何利用GPU通用编程技术优化日常应用软件的执行效率。在DEFLATE算法的各种实现版本中，我们选择了GZIP的实现版本，因为该实现版本是最接近RFC1951文档中关于DEFLATE算法的描述。在优化方案上，我们选择了使用CPU和GPU协同运算加速的方式。为GZIP的运行重新设计了一套并行流水线机制，对部分算法的CPU实现代码进行了改进，并用OpenCL编程框架将DEFLATE算法中比较适合GPU编程的部分进行了kernel实现。经过多种硬件环境的测试，结果表明，该解决方案的最终实现版本能够对部分测试用例起到不错的加速效果。本文在实现对DEFLATE算法进行CPU和GPU协同加速的同时，也对以下内容作了较深入的分析和研究： 1.对GPU体系架构和GPU编程技术进行了探讨。对NVIDIA和AMD两大显卡生产厂商的GPU架构设计特点，进行了系统分析。给出了GPU编程技术的发展历史，介绍了利用OpenCL的编写GPU通用程序的方法。 2.总结分析了相关的数据压缩技术。以BZIP2算法和DEFLATE算法为例，分析了压缩算法的压缩原理和常见的压缩技术。细致分析了DEFLATE算法的LZ77压缩编码部分和Huffman熵编码部分。 3.对GZIP源码的分析。主要分析其源码的组成结构、关键函数的实现细节以及优化可能性分析等。通过那些分析内容构建出一套比较合理的优化解决方案，并实现于最终的改进代码中。
[Abstract]:Heterogeneous computing is the main trend in the field of high performance computing in the future.Following the development trend of this technology, GPU general computing technology is facing unprecedented development opportunities.Graphics processor in hardware design has: floating-point computing power is powerful, suitable for large-scale,The general computing technology of GPU is to make use of these characteristics of GPU to accomplish the general computing tasks in the field of non-graphic processing.With the development of GPU programming technology, using GPU to cooperate with CPU to build high performance computing platform with powerful computing performance and low cost has a wide application prospect.The information age has brought the "information explosion".With the prosperity of the Internet and the development of mobile communication, the storage and communication of mass information become increasingly prominent.The function of data compression and its social benefit, economic benefit will be more and more obvious.Without data compression, both data storage and data transmission are difficult to apply.The advantage of using data compression is that data compression can not only save storage space for users, but also can transmit all kinds of information quickly and reduce communication delay.In addition, data compression can also play an important role in saving communication bandwidth and resource consumption.There are many lossless compression algorithms in the field of data compression, such as flash BZIP2 / LZMALZMA2 and so on. Among them, the fastest compression speed is the DEFLATE compression algorithm.According to the actual performance test of DEFLATE algorithm, it is found that the performance of DEFLATE algorithm in big data file compression is not as satisfactory as expected.Therefore, this paper takes the performance improvement of DEFLATE algorithm as the starting point, and studies how to optimize the execution efficiency of daily application software by using the general programming technology of GPU.Among the various versions of the implementation of the DEFLATE algorithm, we chose the implementation version of GZIP because it is the closest to the description of the DEFLATE algorithm in the RFC1951 document.In the optimization scheme, we choose to use CPU and GPU to accelerate the cooperative operation.This paper redesigns a set of parallel pipeline mechanism for the operation of GZIP, improves the CPU implementation code of some algorithms, and implements the part of DEFLATE algorithm which is more suitable for GPU programming with OpenCL programming framework.After testing in various hardware environments, the results show that the final implementation version of the solution can accelerate part of the test cases well.In this paper, the DEFLATE algorithm for CPU and GPU co-acceleration, but also for the following in-depth analysis and research:1.The GPU architecture and GPU programming technology are discussed.The characteristics of GPU architecture design of NVIDIA and AMD display card manufacturers are systematically analyzed.The development history of GPU programming technology is given, and the method of writing GPU general program using OpenCL is introduced.2.The related data compression techniques are summarized and analyzed.Taking BZIP2 algorithm and DEFLATE algorithm as examples, the compression principle and common compression techniques of compression algorithm are analyzed.The LZ77 compression coding part and Huffman entropy coding part of DEFLATE algorithm are analyzed in detail.3.Analysis of GZIP source code.Mainly analyzes its source code composition structure, the key function realization detail as well as the optimization possibility analysis and so on.A reasonable set of optimization solutions is constructed by analyzing the content and implemented in the final improvement code.
【学位授予单位】：吉林大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP338.6;TP391.41

【相似文献】