SIMD DSP中的高性能定点算术运算部件的设计与实现

发布时间：2018-04-25 08:11

本文选题：银河飞腾迈创 + SIMD　；参考：《国防科学技术大学》2012年硕士论文

【摘要】：在视频图像处理、雷达信号处理和无线通信等嵌入式计算领域，由于处理数据量较大、数据并行性高，对数据计算的精度和实时性要求高，而且这些数据的处理具有高的乘法运算密集性和加法运算密集型，，使得数字信号处理器对乘加混合运算和并行运算的处理能力需求变得日益重要。本文依托“YHFT-Matrix DSP”的开发与研制，旨在研究和设计面向SIMD DSP的高性能定点算术运算部件，以满足数字信号处理器对乘加混合运算和并行运算的处理能力。该部件集成了加减法、乘法、乘加、乘减、点积和复数等各种运算，并使这些运算支持并行处理。本文的主要工作和贡献如下： (1)采用并行前缀加法器中的Kogge-Stone树结构，由符号位控制和进位控制的方法实现了SIMD加法器，并添加饱和处理功能。该加法器能完成8/16/32/40位SIMD加法/减法，包括有符号/无符号运算，且能工作在饱和模式和非饱和模式。 (2)采用符号预处理和拼接的技术对两个16×8乘法器组合实现了16位SIMD乘法器，其中的16×8乘法器采用基4Booth编码、以5-2和4-2压缩器为主的华莱士压缩树和并行前缀Kogge-Stone树结构作为最终加法器的方法实现。同时本文设计了32位SIMD乘法器，该乘法器能完成8/16/32×16/32位SIMD有符号/无符号乘法。 (3)根据Mibench算法、LTE协议、4G无线协议和H.264中的核心算法的指令需求分析结果，本文设计了4站流水结构的高性能定点算术运算部件。该部件能有效的完成高并行性的乘法密集性和加法密集性运算。本文所设计的算术运算部件应用在YHFT-Matrix DSP芯片中，目前该芯片已经流片成功，SDK板测试表明本算术运算部件能很好的满足SIMD DSP所面向的乘法密集性和加法密集性的嵌入式计算需求。
[Abstract]:In the embedded computing fields such as video image processing, radar signal processing and wireless communication, because of the large amount of data processing and the high parallelism of data, the precision and real-time performance of data calculation are very high. Moreover, the processing of these data is highly multiplicative and additive intensive, which makes the processing ability of the digital signal processor (DSP) more and more important. Based on the development and research of "YHFT-Matrix DSP", this paper aims to study and design a high performance fixed-point arithmetic unit for SIMD DSP, so as to satisfy the digital signal processor's ability to deal with multiplication and addition mixed operations and parallel operations. It integrates addition, subtraction, multiplication, multiplication, multiplication, multiplication, dot product and complex number, and makes these operations support parallel processing. The main work and contributions of this paper are as follows: The Kogge-Stone tree structure of the parallel prefix adder is adopted. The SIMD adder is realized by symbol bit control and carry control, and the saturation processing function is added. The adder can perform 8 / 16 / 32 / 40 bit SIMD addition / subtraction, including signed / unsigned operations, and can work in saturation mode and unsaturated mode. The 16-bit SIMD multiplier is implemented by combining two 16 脳 8 multipliers with symbol preprocessing and splicing, in which 16 脳 8 multipliers are coded by base 4Booth. The method of using the Wallace compression tree and the parallel prefix Kogge-Stone tree structure as the final adders is presented, which is mainly composed of 5-2 and 4-2 compressors. At the same time, a 32-bit SIMD multiplier is designed, which can accomplish the signed / unsigned multiplication of SIMD in 8-16-32 脳 16 / 32 bit. 3) according to the result of instruction requirement analysis of the 4G wireless protocol and the core algorithm in H.264, a 4-station pipelined high performance fixed-point arithmetic unit is designed in this paper. This part can effectively perform multiplication and addition dense operations with high parallelism. The arithmetic operation unit designed in this paper is used in YHFT-Matrix DSP chip. The test of the chip has been successful in SDKboard. The result shows that the arithmetic operation unit can meet the demand of SIMD DSP for multiplicative and additive intensive embedded computing.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP332.2

【参考文献】