基于Kakadu的JPEG2000解码系统GPU并行优化
本文选题:GPU + JPEG2000 ; 参考:《西安电子科技大学》2014年硕士论文
【摘要】:JPEG2000是基于小波变换的图像压缩标准,因其良好的低比特压缩性能、能实现渐进传输、可对感兴趣区域编码以及良好的鲁棒性等优点,被广泛应用于遥感、航天航空、医学、军事、气象等各大领域。Kakadu是目前JPEG2000算法实现效率最高的系统之一,依靠其独特的三层体系结构极大程度上简化了图像编解码的复杂性,并且以面向对象方式实现,使其具有良好的可复用性。但是随着科技的发展,尤其在航空航天和军事领域,对图像的解压缩具有较高的速度要求,目前基于CPU的解决方案成本较高效率较低,难以满足实际需求。CPU中缓存与控制器单元消耗了大部分的晶体管资源,而图形处理器GPU将更多的晶体管资源用于了逻辑运算,因此计算能力相比CPU有很大优势,适合大规模并行运算。为了提高基于Kakadu的图像解压缩系统效率,满足实际应用需求,本文提出了一种基于Kakadu的JPEG2000解码系统的GPU并行优化方案,将Kakadu解码系统的解码核心部分使用高性能并行计算技术在GPU上实现。本文介绍了JPEG2000图像压缩标准、GPU的发展过程和CUDA编程,然后对基于Kakadu的JPEG2000解码系统进行GPU并行优化,主要工作有:1、Tier2部分的高性能并行实现。Tier2模块主要分为三部分:包头解析,tile头解析,码流组织。本文采用线程级并行方案,对码流组织部分使用不同GPU线程负责不同位置的比特搬移,线程之间相互并行的方法实现。2、Tier1部分的高性能并行实现。Tier1解码使用码块级并行解码,各码块之间相互独立,使用一个GPU线程块解码一幅图像,线程块中的一个线程解码图像的一个码块,进行码块之间的并行。3、小波逆变换部分的高性能并行实现。采用图像内部行与行之间并行,图像与图像之间串行操作。小波逆变换包括四个步骤,预缩放,垂直滤波,水平滤波器,后缩放。将每个部分通过线程级并行来加速。将GPU的线程块的数目设定图像行数。用一个GPU线程块处理一行,完成行之间的并行,用一个线程处理一个像素点,完成像素点之间的并行。通过将基于Kakadu的JPEG2000解码系统进行GPU并行优化,优化后解压缩恢复图像的质量和优化前图像质量相同,在解码图像质量保证的情况下,解码速度有2到4倍的提高。该系统的GPU并行优化大大加快了解码系统的整体运行速度,提高了JPEG2000图像解压缩算法的吞吐量,满足了大数据量图像实时解码的需求。
[Abstract]:JPEG2000 is an image compression standard based on wavelet transform. Because of its good performance of low bit compression, progressive transmission, coding of region of interest and good robustness, JPEG2000 is widely used in remote sensing, aerospace, medicine, etc. Kakadu is one of the most efficient JPEG2000 algorithms in military, meteorological and other fields. It greatly simplifies the complexity of image coding and decoding by its unique three-tier architecture, and is implemented in an object-oriented manner. It has good reusability. However, with the development of science and technology, especially in the fields of aerospace and military, the decompression of image has a high speed requirement, and the cost of the solution based on CPU is low. It is difficult to meet the actual requirements. The buffer and controller unit in the CPU consumes most of the transistor resources, while the graphics processor GPU uses more transistor resources for the logic operation, so the computing power has a great advantage over CPU. Suitable for large scale parallel operation. In order to improve the efficiency of the image decompression system based on Kakadu and meet the practical application requirements, this paper proposes a GPU parallel optimization scheme for the JPEG2000 decoding system based on Kakadu. The core part of Kakadu decoding system is implemented on GPU using high performance parallel computing technology. This paper introduces the development process and CUDA programming of JPEG2000 image compression standard. Then the GPU parallel optimization of JPEG2000 decoding system based on Kakadu is carried out. The main work includes the parallel implementation of high performance. Tier2 module is mainly divided into three parts: packet head analysis tile head analysis, bit stream organization. In this paper, thread-level parallelism scheme is adopted. Different GPU threads are used to carry out bit shift at different locations for the bitstream organization part. The parallel implementation of the high performance parallel implementation of .2nTier1 part using block level parallel decoding is realized by parallelism between threads. Each block is independent of each other. A GPU thread block is used to decode an image, a thread in the thread block decodes a block of the image, and the parallel .3between the blocks, and the high performance parallel implementation of the inverse wavelet transform part is realized. The serial operation between the image and the image is adopted. The inverse wavelet transform consists of four steps: pre scaling, vertical filtering, horizontal filter and back scaling. Each part is accelerated by thread-level parallelism. Set the number of thread blocks of the GPU to the number of rows in the image. A GPU thread block is used to process a row to complete the parallelism between rows, and a thread is used to process a pixel point to complete the parallelism between pixels. By parallel GPU optimization of JPEG2000 decoding system based on Kakadu, the quality of decompressed and restored image is the same as that of pre-optimized image, and the decoding speed is improved by 2 to 4 times when the quality of decoded image is guaranteed. The GPU parallel optimization of the system greatly speeds up the overall speed of the decoding system, improves the throughput of the JPEG2000 image decompression algorithm, and meets the needs of real-time decoding of large amount of images.
【学位授予单位】:西安电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN919.81
【相似文献】
相关期刊论文 前10条
1 陈平;侯正信;;MP3解码系统[J];电子测量技术;2004年01期
2 张维琛,石秀仑,,将妙法,唐伯良;彩电基板解码系统的研究与调试[J];上海大学学报(自然科学版);1995年06期
3 张永学,余兆明;超级VCD技术及其解码系统[J];电子工程师;1999年04期
4 郑洪超,胡剑凌;嵌入式MPEG-4解码系统的设计与实现[J];电子技术应用;2004年11期
5 孔宇,顿月芹,宁飞;用CPLD提高解码系统的运行速度[J];现代电子技术;2005年04期
6 彭彬;刘俊;;基于在线解码系统设计与分析[J];现代计算机(专业版);2013年32期
7 罗钧,付丽;基于DSP的MP3解码系统设计[J];重庆大学学报(自然科学版);2005年01期
8 宋志章;马丽;刘晓华;;基于ARM的数字音频解码系统的设计与实现[J];科技通报;2012年06期
9 方粮,李琼,陈福接;基于CL9100/CL9110的MPEG-2解码系统的设计[J];今日电子;1997年03期
10 邸兴;张建花;陈贝;;基于STM32的BMP图片解码系统[J];电子设计工程;2011年10期
相关硕士学位论文 前8条
1 韩小晴;基于Kakadu的JPEG2000解码系统GPU并行优化[D];西安电子科技大学;2014年
2 胡银林;静止和活动图像一体化软件解码系统设计与实现[D];西安电子科技大学;2011年
3 隋元明;基于DM642数字信号处理器的波前编码解码系统[D];浙江大学;2012年
4 许梦阳;基于GPRS的气象信息播报字幕解码系统[D];郑州大学;2013年
5 陈明华;多媒体信息压缩技术的研究及MPEG-2解码系统的设计与实现[D];浙江工业大学;2003年
6 王琨文;基于DSP的MPEG4多路解码系统设计[D];华中科技大学;2007年
7 曾昭贵;JPEG2000解码系统的FPGA实现[D];西安电子科技大学;2006年
8 计丹;基于定点DSP的MP3解码系统设计与实现[D];华中师范大学;2002年
本文编号:1942208
本文链接:https://www.wllwen.com/kejilunwen/wltx/1942208.html