高性能微处理器中浮点融合乘加部件的设计与实现

发布时间：2018-09-08 09:28

【摘要】：浮点融合乘加（FMA）部件作为高性能微处理器的核心运算部件之一，对整个微处理器的浮点性能具有很大影响。浮点融合乘加运算算法复杂，逻辑执行时间长，规模大；且验证难，设计周期长。因此，对高性能浮点融合乘加部件的研究具有广泛的应用价值和重要的现实意义。本文对高性能浮点融合乘加部件的设计和优化技术进行了研究，课题的研究内容作为国家重大项目“高性能X处理器”的一部分，研究成果直接应用于工程实践。基于单数据通路FMA算法，无异常中断和软件协处理(SWA)机制，以高频率、小面积、兼容IEEE754标准为目标，本文设计了支持非规格化数，符号零，无穷大和NaNs数输入与输出的FMA部件。主要研究工作及成果包括以下几点： 1.对高性能浮点融合乘加部件及其关键技术进行了广泛的研究，在此基础上设计并实现了高性能X处理器的浮点融合乘加部件。 2.提出了一种乘法阵列的进位修正结构；设计了基于EAC结构的主加法器，减少了FMA的逻辑级数，提高了执行速度。 3.采用最大规格化移位量控制和灵活的一位规格化修正技术设计了支持非规格化数的简捷LZA结构；将精确无穷大操作和NaNs数据通路并入对齐的加数数据通路，非规格化操作数处理融入到正常的规格化数据流中，以最大限度地共享尾数处理数据通路。 4.用Verilog硬件描述语言完成了对整个设计的RTL级流水化建模实现。整个设计通过了包括IEEE754标准测试向量、特殊操作数、边角数据和大量的随机向量等各种测试集的测试，，保证了设计的正确性。最后，对本文设计的浮点融合乘加部件进行了综合和优化调试，采用40nm体硅CMOS工艺，在最坏工艺条件下，其频率能达到2.5GHz，面积56735.9um2，满足X处理器的设计要求。
[Abstract]:As one of the core computing components of high-performance microprocessors, floating-point fusion multiplication plus (FMA) has great influence on the floating-point performance of the whole microprocessor. The floating-point fusion multiplication and addition algorithm is complex, the logical execution time is long, the scale is large, and the verification is difficult and the design period is long. Therefore, the research of high performance floating-point fusion multiplicative components has wide application value and important practical significance. In this paper, the design and optimization of high performance floating-point fusion multiplicative components are studied. As a part of the national important project "High performance X processor", the research results are directly applied in engineering practice. Based on the single data path FMA algorithm, no abnormal interrupt and software coprocessing (SWA) mechanism, and aiming at high frequency, small area and compatible with IEEE754 standard, this paper designs FMA parts that support non-normalized number, symbol zero, infinity and NaNs number input and output. The main research work and results include the following: 1. The high performance floating-point fusion multiplier and its key technology are studied extensively. Based on this, the floating-point fusion multiplicative and additive component of high performance X processor is designed and implemented. 2. In this paper, a carry correction structure of multiplication array is proposed, and a main adder based on EAC structure is designed, which reduces the logical series of FMA and improves the execution speed. A simple LZA structure supporting non-normalized number is designed by using the maximum normalized shift control and flexible one-bit correction technique, and the precise infinity operation and the NaNs data path are incorporated into the aligned additive data path. Non-normalized Operand processing is integrated into the normal normalized data stream to maximize the sharing of Mantissa processing data path. 4. The RTL level pipelining modeling of the whole design is implemented with Verilog hardware description language. The whole design has passed the tests including IEEE754 standard test vector, special Operand, edge angle data and a large number of random vectors, which ensures the correctness of the design. Finally, the floating-point fusion multiplicative component designed in this paper is synthesized and optimized. The 40nm bulk silicon CMOS process is adopted. Under the worst technological conditions, the frequency can reach 2.5 GHz and the area is 56735.9 um2, which meets the design requirements of X processor.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】