当前位置:主页 > 科技论文 > 计算机论文 >

面向多核向量处理器的FFT算法设计与实现

发布时间:2018-01-01 06:09

  本文关键词:面向多核向量处理器的FFT算法设计与实现 出处:《国防科学技术大学》2014年硕士论文 论文类型:学位论文


  更多相关文章: FFT 融合乘加 多核处理器 向量化 软件流水


【摘要】:FFT算法作为数字信号处理的主要工具,在高性能计算领域中扮演着重要的角色,是衡量处理器性能的重要指标。针对多核向量X-DSP的体系结构的特点,研究高效的FFT向量化设计与实现方法具有重要的理论意义和应用价值。本文深入分析了FFT算法的特性,成功设计并实现了基2 FFT、基4 FFT、大点数FFT和混合基FFT算法程序。本文主要研究工作包括以下几个方面:(1)设计和实现了基于X-DSP的基2 FFT算法程序。针对具有融合乘加的体系结构特点,分析及优化了DIT和DIF基2 FFT的蝶形单元,充分利用了融合乘加指令,提高了FLOPS吞吐率;同时将混洗请求与访存请求相结合,且利用软件流水的方法进行优化,提升了程序的执行效率。实验结果表明:相比CUFFT库的性能,单精度和双精度基2 FFT的平均性能分别提高3.12倍和22.97倍;相比FFTW库的性能,单精度和双精度基2 FFT的平均性能分别提高3.52倍和25.29倍。(2)设计和实现了基于X-DSP的基4 FFT算法程序。充分利用融合乘加指令优化了DIT和DIF基4 FFT的蝶形单元,同时将混洗请求与访存请求相结合。实验结果表明:相比基2 FFT,DIT基4相比DIT基2的性能提升了11.46%-21.34%;DIF基4 FFT相比DIF基2 FFT的平均性能提升了9.1%。(3)设计和实现了基于X-DSP的大点数FFT算法程序。详细分析了大点数FFT算法MFA和迭代FFT,设计并优化了基于DMA双缓冲的单核程序;提出了一种压缩存储系数因子的方法节省存储空间,将并行的MFA分块算法映射到多个核中,优化了多核间的负载平衡,从而高效地实现了多核并行的大点数FFT算法,平均加速比达到6.43,取得了较高的性能加速比。(4)分析并优化了DIT基3和基5 FFT的蝶形单元,设计和实现了基于X-DSP的混合基FFT算法程序。实验结果表明,单精度浮点1536点和2400点混合基的计算时间分别为0.00247ms和0.00348ms,取得了较高的计算性能,并且随着点数的增加,混合基FFT计算性能明显提升。
[Abstract]:The FFT algorithm as the main tool of digital signal processing in high performance computing plays an important role in the field, is an important indicator to measure the performance of processor. According to the characteristics of architecture of multi core vector X-DSP, efficient FFT to research and implement method of quantitative design has important theoretical significance and application value. This paper deeply analyzes the the characteristics of FFT algorithm, the success of the design and implementation of FFT based 2, base 4 FFT, large point FFT and mixed base FFT algorithm. The main research work includes the following aspects: (1) the design and implementation of X-DSP based on FFT algorithm. In view of the 2 architecture features fused multiply add, analysis and the optimization of DIT and DIF based FFT butterfly unit 2, make full use of the fused multiply add instruction, improve the FLOPS throughput; at the same time will shuffle and request access request combination method and the use of software pipelining and optimize And improve the execution efficiency of the program. The experimental results show that the performance of CUFFT base compared to the average performance of single and double precision 2 FFT were increased by 3.12 times and 22.97 times; the performance of FFTW base compared to the average performance of single and double precision 2 FFT were increased by 3.52 times and 25.29 times (. 2) the design and implementation of X-DSP based on 4 FFT algorithm program. Make full use of the butterfly unit fused multiply add instruction optimization DIT and DIF 4 FFT, while the shuffle request and the access request combination. The experimental results show that compared to the base 2 FFT, DIT 4 compared to DIT 2 of base to enhance the 11.46%-21.34%; DIF based 4 FFT compared to the DIF average performance of 2 FFT 9.1%. (3) to enhance the design and implementation of the program for large point FFT algorithm based on X-DSP. A detailed analysis of the large point FFT MFA and FFT iterative algorithm, the design and optimization of the single nuclear program based on DMA double buffering is proposed; pressure Method of shrink storage coefficient factor to save storage space, the parallel MFA block algorithm is mapped to multiple cores, optimization of load balancing between multiple cores, so as to efficiently implement a large point FFT multi-core parallel algorithm, the average speedup ratio reached 6.43, achieved high performance and speed ratio (4). The butterfly unit analysis and optimization of the DIT base 3 and base 5 FFT, the design and implementation of the program of hybrid based FFT algorithm based on X-DSP. The experimental results show that the computation time of 1536 points and 2400 points mixed base single precision floating point were 0.00247ms and 0.00348ms, has high computational performance, and with the increase of the number of FFT, mixed computing performance improved significantly.

【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP332

【参考文献】

相关期刊论文 前2条

1 高振斌;王霞;;超长点数FFT处理器的旋转因子生成方法[J];电讯技术;2007年06期

2 李新社,易亚星,李忠科;FFT中旋转因子生成算法的研究[J];航空计算技术;2000年03期



本文编号:1363248

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1363248.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户89e3e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com