基于CORDIC的离散三角变换快速算法及其实现研究
发布时间:2018-04-13 04:10
本文选题:离散三角变换 + 坐标旋转数字计算机 ; 参考:《哈尔滨工业大学》2014年博士论文
【摘要】:离散三角变换(Discrete Trigonometic Transform, DTT)在信息处理,尤其是视频、图像处理领域具有非常重要的地位和应用,其快速算法及硬件实现一直是信息处理领域的研究热点。新视频压缩标准H.265/HEVC发布后,传统的典型点数DTT已不能满足实际应用要求,大点数(尤其是2n点)、可变点数的快速算法将成为该领域的研究热点。 在视频、图像处理领域,精确计算DTT的硬件实现方式已基本成熟,采用近似计算成为提高其计算速度的另一有效途径。随着使用者对图像品质、处理速度要求不断提高,采用一种编码方式已不能满足应用要求。视频、图像压缩编码正向多正交变换混合编码方向发展,设计出能实现多种正交变换且性能优良的通用架构(Unified architecture)是亟待解决的问题。 本文针对以上研究热点问题,对大点数(2n点)DTT的快速算法及其基于改进型非重叠CORDIC的硬件实现以及离散正交变换的通用架构展开研究,主要研究工作包括: 1、研究了以CORDIC作为变换核函数的任意2n点DTT快速算法。首先,通过奇偶分解推导出了以CORDIC作为变换核函数的任意2n点DCT-II和DST-II的快速算法,并给出了规律一致的信号流图;然后,根据正交变换的对偶原理得到了DCT-III和DST-III的快速算法及其信号流图,从而提出了一种新型的基于CORDIC的基-2DTT快速算法。与现有算法比较,该算法在硬件复杂度、可扩展性、流水线设计、模块化设计等性能指标上优于同类算法,且具有以下突出特点:适用于任意2n点的DTT;既有较低的算法复杂度又易于VLSI硬件实现;算法中CORDIC的旋转角度为等差数列;具有规则的蝶形运算结构和统一的缩放因子,易于实现流水线设计;支持原位运算等。 2、研究了基于非重叠CORDIC处理单元的DTT硬件实现方法。首先,针对传统非重叠CORDIC算法中迭代次数与计算精度相互制约的问题,提出了一种改进型非重叠CORDIC(MCORDIC),以牺牲极少精度为代价将迭代次数减少了50%;然后,根据所提出的算法中CORDIC的旋转角度为等差数列这一特点,采用复用设计和模块化设计思想,大幅度减少了计算DTT所需的CORDIC运算单元的数量和类型,理论上任意2n点的DTT仅需要一种类型CORDIC;在此基础上提出了一种新型DTT脉动阵列设计方法,基于该方法设计的脉动阵列在电路延迟、吞吐率、流水线操作及硬件复杂度等性能指标上优于其他类似架构,并解决了由于存在不同类型的基本运算单元(PE)而导致的计算时序不同步以及PE中存在多种算术运算等问题。 3、以所提出的快速算法为研究基础,对四种类型DTT之间的内在关系进行了探讨。利用相同点数的DTT具有相同的CORDIC运算单元这一特点,通过控制信号流向来实现不同类型DTT的计算,从而提出了一种基于CORDIC的DTT通用架构设计方法。所提出的方法适用于任意2n点DTT,,可实现四种DTT的任意组合的通用架构,并且具有以下优点:具有统一的变换核函数,控制电路简单,硬件复用率高。利用该方法设计了具有代表性的几种通用架构,所设计的架构在硬件复杂度、控制复杂度、吞吐率、可扩展性、模块化程度、流水线设计等性能指标上优于现有通用架构。此外,还给出了DWHT/DCT-II和Haar-DWT/DCT-II通用架构的设计方法。 4、在Haar-DWT/DCT-II通用架构的基础上,研究了基于图像内容的压缩编码硬件实现架构。该架构以图像的JND值为判断依据有选择的进行图像压缩编码。为解决JND计算复杂度高、难于硬件实现的问题,提出了一种基于Haar-DWT的近似计算JND算法,该算法虽然只得到JND的近似解,却大幅度降低了计算复杂度。设计了可实现两种工作模式(近似计算或非近似计算)的可重构DCT-II架构。研究了基于图像内容压缩编码的控制方案、工作模式选取的参考位置和JND阈值的选取方法。实验结果表明该压缩编码架构切实可行。所设计的压缩编码硬件实现架构中没有复杂的算术运算,计算复杂度非常低,因此非常易于VLSI硬件实现。 本文提出了一种新型的以CORDIC作为变换核函数的DTT快速算法,为研究DTT快速算法提供了新的研究思路和方法。研究的近似计算DTT的VLSI实现方式及其通用架构可以满足视频、图像压缩领域目前的需求,并符合未来该领域的发展方向。正如FFT的提出使得DFT在实际应用中得到飞跃性的发展,具有类似FFT特点的DTT快速算法也将使得DTT得到更广泛的应用。论文所研究内容既具有理论研究的前瞻性又具有现实的应用价值。
[Abstract]:Discrete triangular transform ( DTT ) plays a very important role in information processing , especially in video and image processing . Its fast algorithm and hardware implementation have been hot topics in the field of information processing .
In the field of video and image processing , it is an effective way to accurately calculate the hardware realization mode of DTT , and the approximate calculation is adopted as another effective way to improve its computing speed . With the improvement of image quality and processing speed , the application requirement can not be satisfied by adopting a coding mode .
In this paper , based on the above research hot - point problems , the fast algorithm of large - point ( 2n - point ) DTT and its hardware realization based on improved non - overlapping CORDIC and the general framework of discrete orthogonal transformation are studied .
1 . The fast algorithm of arbitrary 2n - point DTT using CORDIC as the transform kernel function is studied . First , the fast algorithm of arbitrary 2n - point DCT - II and DST - II using CORDIC as the transform kernel function is derived by the parity decomposition , and the regular signal flow diagram is given .
Then , the fast algorithm and signal flow diagram of DCT - III and DST - III are obtained according to the duality principle of orthogonal transformation , and a new fast algorithm based on CORDIC based radix - 2DTT is proposed . Compared with the existing algorithm , the algorithm is superior to similar algorithms in terms of hardware complexity , scalability , pipeline design , modular design and the like , and has the following prominent characteristics :
has low algorithm complexity and easy VLSI hardware implementation ;
the rotation angle of the CORDIC in the algorithm is an equal number column ;
the invention has regular butterfly operation structure and uniform scaling factor , and is easy to realize pipeline design ;
support in - situ computing or the like .
2 . The realization method of DTT hardware based on non - overlapping CORDIC processing unit is studied . First , aiming at the problem of mutual restriction between the number of iterations and the calculation precision in the traditional non - overlapping CORDIC algorithm , an improved non - overlapping CORDIC ( MCORDIC ) is proposed to reduce the number of iterations by 50 % at the cost of very little precision .
Then , according to the characteristics of CORDIC in the proposed algorithm , the number and the types of CORDIC arithmetic units required for calculating DTT are greatly reduced by using multiplexing design and modular design idea , and only one kind of CORDIC is needed for any 2n point in theory .
On the basis of this , a novel design method of DTT pulse array is presented , which is superior to other similar architectures in terms of circuit delay , throughput rate , pipeline operation and hardware complexity based on the design of the method , and solves the problems of non - synchronization of computing timing due to the existence of different types of basic arithmetic units ( PE ) and the existence of various arithmetic operations in PE .
3 . Based on the proposed fast algorithm , the intrinsic relationship between the four kinds of DTT is discussed . A universal architecture based on CORDIC is proposed . The proposed method is suitable for any 2n - point DTT . It has the advantages of simple control circuit and high hardware reuse . The proposed architecture is superior to the existing general architecture in terms of hardware complexity , control complexity , throughput , scalability , modularity , pipeline design , etc . The design method of DWHT / DCT - II and Haar - DWT / DCT - II general architecture is also given .
4 . Based on the general architecture of Haar - DWT / DCT - II , the hardware implementation architecture of compression coding based on image content is studied .
This paper presents a novel DTT fast algorithm with CORDIC as a transform kernel function , which provides a new research thinking and method for the study of DTT fast algorithm . The VLSI implementation of DTT and its general architecture can meet the current demands in the field of video and image compression .
【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TN919.81
【参考文献】
相关期刊论文 前2条
1 闫宇松,sxx0.math.pku.edu.cn,石青云;可逆的DCT整型变换与无失真图像压缩[J];软件学报;2000年05期
2 莫钧,唐昆;第四类DCT的快速算法[J];信号处理;1999年02期
本文编号:1742860
本文链接:https://www.wllwen.com/kejilunwen/wltx/1742860.html