高能效混合浮点FFT硬件加速器架构与VLSI实现研究

发布时间：2018-01-18 06:15

本文关键词：高能效混合浮点FFT硬件加速器架构与VLSI实现研究　出处：《复旦大学》2014年硕士论文　论文类型：学位论文

【摘要】：快速傅里叶变换(FFT)是数字信号处理中最常用的算法之一。它始终是数字信号处理领域的研究热点。如今,FFT是很多新兴应用中的关键处理模块,如基于正交频分复用(OFDM)的手持移动通信系统和生物医疗电子信号处理平台。这些应用有一个显著的共同点,那就是它们要求整个系统的功耗极低,以延长产品的使用周期。同时,它们也要求系统具备良好的适应性,在面对不同信号输入时,都能给出理想的处理结果。因此,FFT硬件加速器必需在保证一定量化信噪比(SQNR)输出的前提下做到高能效、低成本和高灵活性的实现。针对上述要求,本文从算法和电路层面优化设计实现FFT硬件加速器。在算法方面,本文总结了FFT硬件实现中常用的数据表示格式,包括定点格式、浮点格式和基于定点缩放的方法。在这些格式的基础上,本文提出了动态偏置调节的混合浮点方法。该方法采用浮点格式的指数域和定点格式的小数域,并使复数的实部和虚部共享一个指数域。这样可以在保证数据精度的前提下,减少硬件实现的成本和功耗。此外,动态偏置调节的方法可以根据输入信号的不同在运算过程中动态调整数据表示范围,从而提高整体SQNR。这种机制保证了FFT硬件加速器的灵活性和高精度输出。因此,采用动态偏置调节的混合浮点方法的FFT硬件加速器能够以较小数据位宽获得较高SQNR,从而达到降低功耗和成本的目标。在电路层面,本文实现的FFT硬件加速器采用单存储器架构以降低硬件的开销。在数据通路的实现中,本文采用多种方法来降低功耗和提高SQNR。第一,本文分析并减少蝶形运算中所需的浮点归一化操作,由原来的15个操作降低到4个操作。第二,本文分析并缩短蝶形运算中所需的数据处理位宽,在小数位宽为9时,可以使中间处理位宽节省多达6比特。第三,本文采用Trounding的数据舍去策略,尽可能地降低量化误差而不增加过多的硬件开销。此外,本文最后着眼于基于低电压存储器的FFT硬件加速器设计。首先概述存储器故障的种类和产生原因。然后描述了一定电压下存储器故障率的分析仿真方法。之后,给出具体故障率与电压和电路频率之间的关系。并根据这个对应关系分析出一定存储器电压下FFT硬件加速器的SQNR以及该情况下的功耗收益。本文提出的FFT硬件加速器能够计算64-8192点的变换。当数据位宽为3+2*9比特,存储器电压为0.7V,使用SMIC 65nm工艺时,FFT硬件加速器工作在400MHz,面积为0.482mm2,功耗为35.3mW。64点和8192点对应的SQNR分别为41.6 dB和35.8 dB。
[Abstract]:Fast Fourier transform (FFT) is one of the most commonly used algorithms in digital signal processing. It has always been a research hotspot in the field of digital signal processing. Nowadays, FFT is a key processing module in many new applications. For example, the handheld mobile communication system and biomedical electronic signal processing platform based on OFDM (orthogonal Frequency Division Multiplexing). These applications have a remarkable common point, that is, they require very low power consumption of the whole system. At the same time, they also require the system to have good adaptability, in the face of different signal input, can give the ideal processing results. FFT hardware accelerator must achieve high energy efficiency, low cost and high flexibility on the premise of certain quantization signal-to-noise ratio (SNR) output. This paper optimizes the design and implementation of FFT hardware accelerator from the algorithm and circuit level. In the aspect of algorithm, this paper summarizes the data representation format commonly used in FFT hardware implementation, including fixed-point format. On the basis of these formats, a mixed floating point method for dynamic bias adjustment is proposed. This method uses the exponential domain of floating point format and the decimal domain of fixed point format. The real and virtual parts of the complex can share an exponential domain, which can reduce the cost and power consumption of the hardware implementation under the premise of ensuring the data accuracy. The method of dynamic bias adjustment can dynamically adjust the range of data representation according to the difference of input signal in the operation process. This mechanism ensures the flexibility and high precision output of FFT hardware accelerator. The hybrid floating-point accelerator with dynamic bias adjustment can obtain higher SQNRs with smaller data bit width, thus achieving the goal of reducing power consumption and cost at the circuit level. The hardware accelerator implemented in this paper uses single memory architecture to reduce hardware overhead. In the implementation of data path, this paper uses a variety of methods to reduce power consumption and improve SQNR. first. This paper analyzes and reduces the floating point normalization operation in butterfly operation, from 15 operations to 4 operations. Secondly, this paper analyzes and shortens the bit width of data processing required in butterfly operation. When the decimal width is 9:00, the intermediate processing bit width can be saved by up to 6 bits. Thirdly, this paper adopts the data reduction strategy of Trounding. Minimize quantization errors without adding too much hardware overhead. In the end, this paper focuses on the design of FFT hardware accelerator based on low voltage memory. Firstly, the types and causes of memory failure are summarized. Then, the analysis and simulation method of memory failure rate under certain voltage is described. After. The relationship between failure rate, voltage and circuit frequency is given. According to this relation, the SQNR of FFT hardware accelerator under certain memory voltage and the power gain in this case are analyzed. The FFT hardware accelerator can calculate the 64-8192 point transformation. When the data bit width is 3. Two or nine bits. The memory voltage is 0.7 V, and the SMIC hardware accelerator is working at 400MHz with an area of 0.482mm2 using SMIC 65nm process. The SQNR corresponding to the power consumption of 35.3mW.64 and 8192 is 41.6 dB and 35.8 dB, respectively.
【学位授予单位】：复旦大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN911.72;TN47

【相似文献】