浮点傅里叶变换硬件架构综合研究
本文关键词: 离散傅里叶变换 浮点 定点 互质数乘积 FPGA ASIC 自动生成 综合 卷积神经网络 出处:《中国科学技术大学》2017年硕士论文 论文类型:学位论文
【摘要】:离散傅里叶变换(DFT)被广泛应用于几乎所有的科学与工程计算领域中,特别是在一些现代大规模数据处理应用中,比如音视频信号数据处理,使用到了越来越多计算复杂且硬件需求高的特性,例如超长点数和非二的正整数次幂点的硬件离散傅里叶变换单元和拥有宽计算范围以及高有效精度的浮点运算。现代离散傅里叶变换应用诸如音视频编解码、正交分频复用、大数据处理等,其对运算实时性要求高需要硬件运算单元,对精度和通用性要求高需要满足IEEE-754标准规范的浮点数,对采样点数要求高需要长点数以及非二的正整数次幂点数的离散傅里叶变换。本文提出了一种基于矩阵分解的用于互质数乘积长度的非二的正整数次幂点数的傅里叶变换算法,并设计了可实现该算法的离散傅里叶变换硬件架构综合工具—AutoNFT。主要工作内容如下:本文研究了基于矩阵分解的可用于两两互质数乘积点数的离散傅里叶变换算法。该算法与已有的用于小奇数(3、5、9)乘二的正整数次幂点数的算法相比,具有更广的应用点数的范围;通过严谨的数学推导证明了算法的正确性,并给出了相较于传统算法不同的输入输出顺序计算公式,以实现互质数离散傅立叶变换模块间的级联。本文设计的AutoNFT综合工具可以自动生成全流水线架构的硬件离散傅里叶变换单元,支持二的正整数次幂点数和两两互质数乘积点数,并具有高度的可移植性,同时支持定点、浮点采样。提出了用于全流水线结构及自动级联的自动生成算法,能够通过基于移位寄存器的先入先出单元有效处理相比基2/4算法更高效的分裂基算法的L型结构;设计了包含八级流水线的高性能浮点加法与乘法单元,可在SMIC 40纳米工艺下工作在1Ghz频率。本文在Zynq 7000平台下对定点及浮点运算单元、手写数字神经网络、16点和15点浮点离散傅里叶变换单元进行了验证。给出了手写数字识别网络LeNet-5的FPGA实现,相比通用计算器件如CPU、GPU实现,在达到软件算法相同的低错误率0.999%的同时,其消耗运算时间比Caffe快37%,并且能耗低达93.7%。同时,本文也在SMIC40纳米工艺和500Mhz频率下,完成了对长点数以及质数乘积点数的定点以及浮点离散傅里叶变换单元的综合和仿真。特别地对于256点离散傅里叶变换单元,其每秒可处理1150亿个定点采样;对于30点离散傅里叶变换单元,其每秒可处理135亿个浮点采样。
[Abstract]:Discrete Fourier transform (DFT) is widely used in almost all fields of scientific and engineering computing, especially in some modern large-scale data processing applications, such as audio and video signal data processing. More and more complex computing and high hardware requirements are used. For example, the hardware discrete Fourier transform unit of super-long points and non-binary positive integer power points and floating-point operations with wide calculation range and high efficient precision.; Modern discrete Fourier transform applications such as audio and video coding and decoding. Orthogonal frequency division multiplexing, big data processing, etc., which requires high real-time operation requirements of hardware operation unit, high accuracy and versatility requirements to meet the IEEE-754 standard standard floating-point number. The discrete Fourier transform which requires long points and non-binary positive integer power points is required for high sampling points. In this paper, a new Fourier transform based on matrix decomposition is proposed for the length of the product length of mutual prime numbers. Riefer transform algorithm. A hardware synthesis tool for discrete Fourier transform (DFT)-AutoNFT is designed. The main work is as follows:. In this paper, we study the discrete Fourier transform (DFT) algorithm based on matrix decomposition, which can be used for the product points of pairwise prime numbers. 3. Compared with the algorithm of multiplying the number of positive integers by two, the algorithm has a wider range of points of application. The correctness of the algorithm is proved by rigorous mathematical derivation, and the formulas for calculating the order of input and output in comparison with the traditional algorithm are given. In order to realize the concatenation between the discrete Fourier transform modules, the AutoNFT synthesis tool designed in this paper can automatically generate the hardware discrete Fourier transform unit of the full pipeline architecture. Two positive integer power points and pairwise prime number product points are supported with high portability and fixed-point and floating-point sampling. An automatic generating algorithm for full pipeline structure and automatic cascade is proposed. The L-type structure of the split base algorithm, which is more efficient than the base 2/4 algorithm, can be effectively processed by the first-in-first-out unit based on the shift register. A high performance floating-point addition and multiplication unit including 8-stage pipeline is designed. It can work at 1 Ghz frequency in SMIC 40 nanoscale process. In this paper, fixed point and floating-point operation unit and handwritten digital neural network are studied on Zynq 7000 platform. 16:00 and 15:00 floating-point discrete Fourier transform units are verified. The FPGA implementation of handwritten numeral recognition network LeNet-5 is given, compared with that of general calculators such as CPU / GPU. At the same time, the software algorithm has the same low error rate (0.999%), which consumes 37 times faster than Caffe, and has a low energy consumption of 93.70.At the same time. In this paper, SMIC40 nanotechnology and 500MHz frequency are also used. The synthesis and simulation of fixed-point and floating-point discrete Fourier transform units for long points and prime product points are completed, especially for 256 points discrete Fourier transform units. It can handle 115 billion fixed-point samples per second; For a 30-point discrete Fourier transform unit, it can handle 13. 5 billion floating-point samples per second.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP301.6
【相似文献】
相关期刊论文 前10条
1 孙大飞;刘浩;刘彬;陈务深;;离散傅里叶变换的进一步探析[J];现代电子技术;2006年11期
2 田秀华;王忠宝;张展;;基于连续傅里叶变换计算离散傅里叶变换的一种算法[J];自动化技术与应用;2007年08期
3 张宪超,武继刚,蒋增荣,陈国良;离散傅里叶变换的算术傅里叶变换算法[J];电子学报;2000年05期
4 陈卫东,杨绍全;加窗离散傅里叶变换测频分辨率研究[J];西安电子科技大学学报;2000年02期
5 江波,,钱惠生;离散傅里叶变换的脉动阵列实现[J];电子学报;1995年04期
6 王旭光;用离散傅里叶变换研究二维抽样的谱分布[J];南京邮电学院学报;1995年03期
7 徐春云;移位离散傅里叶变换的分裂基算法[J];现代雷达;1996年02期
8 陈长兴;求离散傅里叶变换的一种方法[J];电工教学;1996年02期
9 覃赢;;离散傅里叶变换在信号系统中的发展和应用[J];科技致富向导;2013年05期
10 潘文诚;徐鸿飞;李津蓉;孙月兰;李曙光;;信号类课程教学中连续与离散的类比性[J];浙江科技学院学报;2012年04期
相关会议论文 前1条
1 周中定;傅荣;张喜征;;基于离散傅里叶变换的网络可靠性数据分析模型[A];第10届计算机模拟与信息技术会议论文集[C];2005年
相关博士学位论文 前1条
1 刘亮;离散傅里叶变换的Moshe和Hertz算法的推广及应用[D];四川大学;2006年
相关硕士学位论文 前7条
1 冯淦;浮点傅里叶变换硬件架构综合研究[D];中国科学技术大学;2017年
2 朱晓红;全光离散傅里叶变换实现装置的研究与设计[D];华中科技大学;2011年
3 杨悦;基于多维矢量矩阵的DFT算法研究[D];吉林大学;2014年
4 韩晓红;基于FPGA的信号处理单元的研究与实现[D];沈阳航空航天大学;2013年
5 李桂红;OFDM系统中的GDFT算法研究及应用[D];北京邮电大学;2015年
6 李会珠;GDFT在OFDM无线通信系统中的应用与研究[D];北京邮电大学;2012年
7 冯英鹏;高精度动平衡测量中振动信号处理方法研究与实现[D];上海师范大学;2013年
本文编号:1442201
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1442201.html