支持浮点融合乘加的SIMD运算部件设计优化及实现

发布时间：2018-02-02 23:52

本文关键词： SIMD部件融合乘加地址不对齐数据重组掩码　出处：《国防科学技术大学》2013年硕士论文　论文类型：学位论文

【摘要】：SIMD（Single Instruction Multiple Date，单指令多数据）是提高数据并行处理能力的重要手段。随着超大规模集成电路的发展，主流微处理器厂商不断地增加SIMD功能和SIMD的位宽。但SIMD仍然存在诸多性能瓶颈，如地址不对齐、数据重组和控制相关的向量化（Control flow）等问题。论文设计了高性能微处理器中支持浮点融合乘加的SIMD运算部件，以科学计算为背景进行了优化，并进行了综合、验证以及性能分析。本文的主要研究工作： 1.设计了一个7站流水的双精度浮点乘加（Fuse Multiple Add，FMA）单元，并组成了基本的SIMD模块。分析SIMD在各种应用中的性能瓶颈，针对地址不对齐、数据重组和控制相关的向量化，提出了一种可配置的SIMD改进结构。 2.对SIMD运算部件进行模拟验证与综合分析。验证结果表明浮点计算符合IEEE7542008标准，SIMD功能正确。综合结果表明可配置的SIMD相对于基本的SIMD，，面积和功耗分别增加了2.04%和0.46%。经综合评估，该SIMD频率达到2GHz。 3.以向量长度为66的DAPXY（双精度乘加）和稀疏矩阵计算为例，分析可配置的SIMD的性能提升，结果表明与基本的SIMD相比，可配置的SIMD获得了1.17~1.50倍的加速。
[Abstract]:SIMD(Single Instruction Multiple date (single instruction multiple data) is to improve data parallel processing With the development of VLSI, mainstream microprocessor manufacturers increase the SIMD function and the bit width of SIMD continuously. However, there are still many performance bottlenecks in SIMD. Such as address alignment, data reorganization and control related vectorization control flow and other issues. In this paper, we design a SIMD operating unit that supports floating-point fusion multiplication and addition in high-performance microprocessors, and optimize and synthesize it with the background of scientific computing. Verification and performance analysis. 1. A 7-station income double precision floating-point multiplication plus Fuse Multiple add FMA unit is designed. The performance bottleneck of SIMD in various applications is analyzed, aiming at address misalignment, data recombination and control related vectorization. A configurable SIMD structure is proposed. 2. The simulation and comprehensive analysis of the SIMD operation unit show that the floating-point calculation conforms to the IEEE7542008 standard. The results show that the area and power consumption of configurable SIMD are increased by 2.04% and 0.46, respectively. The SIMD frequency is 2 GHz. 3. Taking DAPXY (double precision multiplication plus) and sparse matrix calculation of vector length 66 as examples, the performance improvement of configurable SIMD is analyzed. The results show that the performance of configurable SIMD is higher than that of basic SIMD. The configurable SIMD gains 1.17g 1.50 times acceleration.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP332.2

【共引文献】