一种降低迭代运算误差的FMA结构研究

发布时间：2018-05-04 18:20

本文选题：浮点融合乘加 + 迭代运算　；参考：《国防科学技术大学》2012年硕士论文

【摘要】：在新一代面向应用的多核微处理器的设计中，设计者引入可以显著提高浮点运算性能的融合乘加部件（Fused Multiply-Add，FMA）。与浮点乘法、浮点加法独立执行所不同，融合乘加部件在执行过程中只有一次舍入操作，降低了运算结果的误差。但是，迭代运算的误差随着运算次数的增加而显著增大，从而严重影响运算结果的可靠程度。本文在分析传统融合乘加结构（T-FMA）的基础上研究改进的降低迭代运算误差的FMA结构（IM-FMA）。本文所取得的主要研究成果包括以下几个方面： 1、研究了当前FMA的常见算法和结构以及在大规模数值计算中运算误差所造成的影响，分析了浮点数据系统的分布特点——规格化数据的唯一性和有限区间内的均匀性、计算推导出了不同数据格式在对数分布下的MRRE和ARRE，综述了IEEE754-2008标准所规定的二进制数据格式、异常处理以及误差分析的基础知识。 2、建立了适用于浮点迭代运算误差分析的数学模型，分析推导了FMA部件在执行迭代运算过程中产生误差的决定因素——运算次数n和舍入单元u。在运算次数相同的情况下，舍入单元越小迭代运算的误差越小，所以降低迭代运算误差的根本方法在于降低舍入单元。 3、提出了一种降低迭代运算误差的FMA结构（IM-FMA）。本文摒弃了以往在执行单次运算和迭代运算过程中都只保留有限位宽（53位）尾数的做法，增加了从规格化移位操作之后、舍入操作之前的中间结果（106位）到加数的反馈通路，并通过特定的选择模块来区别参与单次运算模式和迭代运算模式的加数来源。同时，研究了IM-FMA结构中关键子模块——乘法模块、加数对阶移位和前导零预测所采用的算法、结构和逻辑电路实现方法。 4、验证测试了所实现的T-FMA和IM-FMA，然后执行两个迭代测试序列分别得到T-FMA和IM-FMA的运算结果。在测试序列1下T-FMA运算结果的绝对误差为252100、相对误差为1.1102E-14，，IM-FMA运算结果的绝对误差为0、相对误差也为0；在测试序列2下T-FMA的绝对误差为(2521)100、相对误差为1.1102E-14，IM-FMA的绝对误差为100、相对误差为2.4652E-30，所以IM-FMA与T-FMA相比在降低迭代运算误差方面是有效的。
[Abstract]:In the design of a new generation of application-oriented multi-core microprocessors, the designer introduces the fused multiplicative addition component (Fused Multiply-Add-FMAA), which can significantly improve the performance of floating-point computing. Unlike floating-point multiplication, floating-point addition performs independently, the fused multiplicative unit rounding only once in the execution process, thus reducing the error of the result. However, the error of iterative operation increases significantly with the increase of the number of operations, which seriously affects the reliability of the results. In this paper, based on the analysis of the traditional fusion multiplication and addition structure (T-FMA), an improved FMA structure to reduce the error of iterative operation is studied. The main research results of this paper include the following aspects: 1. The common algorithms and structures of FMA and the influence of operational errors in large-scale numerical computation are studied, and the distribution characteristics of floating point data system are analyzed, such as the uniqueness of normalized data and the uniformity in finite interval. The MRRE and ARREs of different data formats under logarithmic distribution are derived, and the basic knowledge of binary data format, exception handling and error analysis in IEEE754-2008 standard is summarized. 2. A mathematical model for the error analysis of floating-point iterative operation is established, and the determinants of errors in the iterative operation of FMA are analyzed and deduced, which are the number of operations n and the rounding unit u. In the case of the same number of operations, the smaller the rounding unit, the smaller the error of iterative operation, so the fundamental method to reduce the error of iterative operation is to reduce the rounding unit. 3. A FMA structure to reduce the error of iterative operation is proposed. In this paper, we abandon the practice of retaining only the finite bit width of 53 bits) Mantissa in the process of single operation and iterative operation, and increase the feedback path from the intermediate result of 106 bits rounding before the operation after the normalized shift operation to the addition. The addition sources of single operation mode and iterative operation mode are distinguished by specific selection modules. At the same time, the algorithm, structure and realization method of logic circuit are studied for the key sub-module of IM-FMA, multiplication module, addition pair shift and leading zero prediction. 4. The implemented T-FMA and IM-FMA are verified and tested, and then two iterative test sequences are executed to get the results of T-FMA and IM-FMA. Under test sequence 1, the absolute error of T-FMA operation is 252100, the relative error is 1.1102E-14 IM-FMA, the relative error is 0, and the absolute error of T-FMA under test sequence 2 is 2521 100, the relative error is 1.1102E-14IM-FMA, the absolute error is 1.1102E-14IM-FMA. The relative error is 2.4652E-30, so IM-FMA is effective in reducing iterative error compared with T-FMA.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP332

【共引文献】