基于CORDIC的离散三角变换快速算法及其实现研究

发布时间：2018-04-13 04:10

本文选题：离散三角变换 + 坐标旋转数字计算机　；参考：《哈尔滨工业大学》2014年博士论文

【摘要】：离散三角变换（Discrete Trigonometic Transform, DTT）在信息处理，尤其是视频、图像处理领域具有非常重要的地位和应用，其快速算法及硬件实现一直是信息处理领域的研究热点。新视频压缩标准H.265/HEVC发布后，传统的典型点数DTT已不能满足实际应用要求，大点数（尤其是2n点）、可变点数的快速算法将成为该领域的研究热点。在视频、图像处理领域，精确计算DTT的硬件实现方式已基本成熟，采用近似计算成为提高其计算速度的另一有效途径。随着使用者对图像品质、处理速度要求不断提高，采用一种编码方式已不能满足应用要求。视频、图像压缩编码正向多正交变换混合编码方向发展，设计出能实现多种正交变换且性能优良的通用架构（Unified architecture）是亟待解决的问题。本文针对以上研究热点问题，对大点数（2n点）DTT的快速算法及其基于改进型非重叠CORDIC的硬件实现以及离散正交变换的通用架构展开研究，主要研究工作包括： 1、研究了以CORDIC作为变换核函数的任意2n点DTT快速算法。首先，通过奇偶分解推导出了以CORDIC作为变换核函数的任意2n点DCT-II和DST-II的快速算法，并给出了规律一致的信号流图；然后，根据正交变换的对偶原理得到了DCT-III和DST-III的快速算法及其信号流图，从而提出了一种新型的基于CORDIC的基-2DTT快速算法。与现有算法比较，该算法在硬件复杂度、可扩展性、流水线设计、模块化设计等性能指标上优于同类算法，且具有以下突出特点：适用于任意2n点的DTT；既有较低的算法复杂度又易于VLSI硬件实现；算法中CORDIC的旋转角度为等差数列；具有规则的蝶形运算结构和统一的缩放因子，易于实现流水线设计；支持原位运算等。 2、研究了基于非重叠CORDIC处理单元的DTT硬件实现方法。首先，针对传统非重叠CORDIC算法中迭代次数与计算精度相互制约的问题，提出了一种改进型非重叠CORDIC（MCORDIC），以牺牲极少精度为代价将迭代次数减少了50%；然后，根据所提出的算法中CORDIC的旋转角度为等差数列这一特点，采用复用设计和模块化设计思想，大幅度减少了计算DTT所需的CORDIC运算单元的数量和类型，理论上任意2n点的DTT仅需要一种类型CORDIC；在此基础上提出了一种新型DTT脉动阵列设计方法，基于该方法设计的脉动阵列在电路延迟、吞吐率、流水线操作及硬件复杂度等性能指标上优于其他类似架构，并解决了由于存在不同类型的基本运算单元（PE）而导致的计算时序不同步以及PE中存在多种算术运算等问题。 3、以所提出的快速算法为研究基础，对四种类型DTT之间的内在关系进行了探讨。利用相同点数的DTT具有相同的CORDIC运算单元这一特点，通过控制信号流向来实现不同类型DTT的计算，从而提出了一种基于CORDIC的DTT通用架构设计方法。所提出的方法适用于任意2n点DTT，，可实现四种DTT的任意组合的通用架构，并且具有以下优点：具有统一的变换核函数，控制电路简单，硬件复用率高。利用该方法设计了具有代表性的几种通用架构，所设计的架构在硬件复杂度、控制复杂度、吞吐率、可扩展性、模块化程度、流水线设计等性能指标上优于现有通用架构。此外，还给出了DWHT/DCT-II和Haar-DWT/DCT-II通用架构的设计方法。 4、在Haar-DWT/DCT-II通用架构的基础上，研究了基于图像内容的压缩编码硬件实现架构。该架构以图像的JND值为判断依据有选择的进行图像压缩编码。为解决JND计算复杂度高、难于硬件实现的问题，提出了一种基于Haar-DWT的近似计算JND算法，该算法虽然只得到JND的近似解，却大幅度降低了计算复杂度。设计了可实现两种工作模式（近似计算或非近似计算）的可重构DCT-II架构。研究了基于图像内容压缩编码的控制方案、工作模式选取的参考位置和JND阈值的选取方法。实验结果表明该压缩编码架构切实可行。所设计的压缩编码硬件实现架构中没有复杂的算术运算，计算复杂度非常低，因此非常易于VLSI硬件实现。本文提出了一种新型的以CORDIC作为变换核函数的DTT快速算法，为研究DTT快速算法提供了新的研究思路和方法。研究的近似计算DTT的VLSI实现方式及其通用架构可以满足视频、图像压缩领域目前的需求，并符合未来该领域的发展方向。正如FFT的提出使得DFT在实际应用中得到飞跃性的发展，具有类似FFT特点的DTT快速算法也将使得DTT得到更广泛的应用。论文所研究内容既具有理论研究的前瞻性又具有现实的应用价值。
[Abstract]:Discrete triangular transform ( DTT ) plays a very important role in information processing , especially in video and image processing . Its fast algorithm and hardware implementation have been hot topics in the field of information processing .

In the field of video and image processing , it is an effective way to accurately calculate the hardware realization mode of DTT , and the approximate calculation is adopted as another effective way to improve its computing speed . With the improvement of image quality and processing speed , the application requirement can not be satisfied by adopting a coding mode .

In this paper , based on the above research hot - point problems , the fast algorithm of large - point ( 2n - point ) DTT and its hardware realization based on improved non - overlapping CORDIC and the general framework of discrete orthogonal transformation are studied .

1 . The fast algorithm of arbitrary 2n - point DTT using CORDIC as the transform kernel function is studied . First , the fast algorithm of arbitrary 2n - point DCT - II and DST - II using CORDIC as the transform kernel function is derived by the parity decomposition , and the regular signal flow diagram is given .
Then , the fast algorithm and signal flow diagram of DCT - III and DST - III are obtained according to the duality principle of orthogonal transformation , and a new fast algorithm based on CORDIC based radix - 2DTT is proposed . Compared with the existing algorithm , the algorithm is superior to similar algorithms in terms of hardware complexity , scalability , pipeline design , modular design and the like , and has the following prominent characteristics :
has low algorithm complexity and easy VLSI hardware implementation ;
the rotation angle of the CORDIC in the algorithm is an equal number column ;
the invention has regular butterfly operation structure and uniform scaling factor , and is easy to realize pipeline design ;
support in - situ computing or the like .

2 . The realization method of DTT hardware based on non - overlapping CORDIC processing unit is studied . First , aiming at the problem of mutual restriction between the number of iterations and the calculation precision in the traditional non - overlapping CORDIC algorithm , an improved non - overlapping CORDIC ( MCORDIC ) is proposed to reduce the number of iterations by 50 % at the cost of very little precision .
Then , according to the characteristics of CORDIC in the proposed algorithm , the number and the types of CORDIC arithmetic units required for calculating DTT are greatly reduced by using multiplexing design and modular design idea , and only one kind of CORDIC is needed for any 2n point in theory .
On the basis of this , a novel design method of DTT pulse array is presented , which is superior to other similar architectures in terms of circuit delay , throughput rate , pipeline operation and hardware complexity based on the design of the method , and solves the problems of non - synchronization of computing timing due to the existence of different types of basic arithmetic units ( PE ) and the existence of various arithmetic operations in PE .

3 . Based on the proposed fast algorithm , the intrinsic relationship between the four kinds of DTT is discussed . A universal architecture based on CORDIC is proposed . The proposed method is suitable for any 2n - point DTT . It has the advantages of simple control circuit and high hardware reuse . The proposed architecture is superior to the existing general architecture in terms of hardware complexity , control complexity , throughput , scalability , modularity , pipeline design , etc . The design method of DWHT / DCT - II and Haar - DWT / DCT - II general architecture is also given .

4 . Based on the general architecture of Haar - DWT / DCT - II , the hardware implementation architecture of compression coding based on image content is studied .

This paper presents a novel DTT fast algorithm with CORDIC as a transform kernel function , which provides a new research thinking and method for the study of DTT fast algorithm . The VLSI implementation of DTT and its general architecture can meet the current demands in the field of video and image compression .

【学位授予单位】：哈尔滨工业大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TN919.81

【参考文献】