当前位置:主页 > 科技论文 > 计算机论文 >

支持浮点融合乘加的SIMD运算部件设计优化及实现

发布时间:2018-02-02 23:52

  本文关键词: SIMD部件 融合乘加 地址不对齐 数据重组 掩码 出处:《国防科学技术大学》2013年硕士论文 论文类型:学位论文


【摘要】:SIMD(Single Instruction Multiple Date,单指令多数据)是提高数据并行处理 能力的重要手段。随着超大规模集成电路的发展,主流微处理器厂商不断地增加SIMD功能和SIMD的位宽。但SIMD仍然存在诸多性能瓶颈,如地址不对齐、数据重组和控制相关的向量化(Control flow)等问题。 论文设计了高性能微处理器中支持浮点融合乘加的SIMD运算部件,以科学计算为背景进行了优化,并进行了综合、验证以及性能分析。本文的主要研究工作: 1.设计了一个7站流水的双精度浮点乘加(Fuse Multiple Add,FMA)单元,并组成了基本的SIMD模块。分析SIMD在各种应用中的性能瓶颈,针对地址不对齐、数据重组和控制相关的向量化,提出了一种可配置的SIMD改进结构。 2.对SIMD运算部件进行模拟验证与综合分析。验证结果表明浮点计算符合IEEE7542008标准,SIMD功能正确。综合结果表明可配置的SIMD相对于基本的SIMD,,面积和功耗分别增加了2.04%和0.46%。经综合评估,该SIMD频率达到2GHz。 3.以向量长度为66的DAPXY(双精度乘加)和稀疏矩阵计算为例,分析可配置的SIMD的性能提升,结果表明与基本的SIMD相比,可配置的SIMD获得了1.17~1.50倍的加速。
[Abstract]:SIMD(Single Instruction Multiple date (single instruction multiple data) is to improve data parallel processing With the development of VLSI, mainstream microprocessor manufacturers increase the SIMD function and the bit width of SIMD continuously. However, there are still many performance bottlenecks in SIMD. Such as address alignment, data reorganization and control related vectorization control flow and other issues. In this paper, we design a SIMD operating unit that supports floating-point fusion multiplication and addition in high-performance microprocessors, and optimize and synthesize it with the background of scientific computing. Verification and performance analysis. 1. A 7-station income double precision floating-point multiplication plus Fuse Multiple add FMA unit is designed. The performance bottleneck of SIMD in various applications is analyzed, aiming at address misalignment, data recombination and control related vectorization. A configurable SIMD structure is proposed. 2. The simulation and comprehensive analysis of the SIMD operation unit show that the floating-point calculation conforms to the IEEE7542008 standard. The results show that the area and power consumption of configurable SIMD are increased by 2.04% and 0.46, respectively. The SIMD frequency is 2 GHz. 3. Taking DAPXY (double precision multiplication plus) and sparse matrix calculation of vector length 66 as examples, the performance improvement of configurable SIMD is analyzed. The results show that the performance of configurable SIMD is higher than that of basic SIMD. The configurable SIMD gains 1.17g 1.50 times acceleration.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP332.2

【共引文献】

相关硕士学位论文 前4条

1 宋卫卫;考虑公共路径的时钟结构重整与优化[D];国防科学技术大学;2013年

2 谢启华;高性能微处理器中浮点融合乘加部件的设计与实现[D];国防科学技术大学;2013年

3 刘元龙;基于路径的OCV分析方法研究与实现[D];国防科学技术大学;2013年

4 孙秀秀;物理设计中基于复用单元的保持时间时序优化方法的研究与实现[D];国防科学技术大学;2013年



本文编号:1485812

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1485812.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8b3a0***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com