基于一阶泰勒级数查表法单精度倒数的设计与实现
发布时间:2018-10-18 13:17
【摘要】:在分析了单精度倒数算法在图形处理器中存在的不足的基础上,设计了一阶泰勒级数单精度倒数算法。与传统算法相比,在资源消耗、运算周期和效率方面得到了有效改善。本浮点倒数算法的主要逻辑模块由一个24位整数加法器、一个ROM和一个24位乘法器组成。将在[1,2)范围的尾数平均分为4 096个区间,将每个区间起始点倒数平方放入查找表,并对每个区间采用一阶泰勒级数计算倒数值。仿真结果表明:仿真的结果与理论结果一致,满足单精度浮点数的精度要求。目前此算法已经成功流片,应用于国产第三代图形处理器JM7200。
[Abstract]:Based on the analysis of the shortcomings of single-precision reciprocal algorithm in GPU, the one-order Taylor series single-precision reciprocal algorithm is designed. Compared with the traditional algorithm, the resource consumption, operation cycle and efficiency are improved effectively. The main logic module of the floating-point reciprocal algorithm consists of a 24-bit integer adder, a ROM and a 24-bit multiplier. The average Mantissa in the range of [1] is divided into 4 096 intervals, the reciprocal square of the starting point of each interval is put into the lookup table, and the inverse value of each interval is calculated by the first order Taylor series. The simulation results show that the simulation results are consistent with the theoretical results and meet the precision requirements of single precision floating-point points. The algorithm has been successfully used in the third generation graphics processor (JM7200.).
【作者单位】: 湖南大学物理与微电子科学学院;湖南城市学院市政与测绘工程学院;
【分类号】:TP332
[Abstract]:Based on the analysis of the shortcomings of single-precision reciprocal algorithm in GPU, the one-order Taylor series single-precision reciprocal algorithm is designed. Compared with the traditional algorithm, the resource consumption, operation cycle and efficiency are improved effectively. The main logic module of the floating-point reciprocal algorithm consists of a 24-bit integer adder, a ROM and a 24-bit multiplier. The average Mantissa in the range of [1] is divided into 4 096 intervals, the reciprocal square of the starting point of each interval is put into the lookup table, and the inverse value of each interval is calculated by the first order Taylor series. The simulation results show that the simulation results are consistent with the theoretical results and meet the precision requirements of single precision floating-point points. The algorithm has been successfully used in the third generation graphics processor (JM7200.).
【作者单位】: 湖南大学物理与微电子科学学院;湖南城市学院市政与测绘工程学院;
【分类号】:TP332
【参考文献】
相关期刊论文 前4条
1 刘金硕;刘天晓;吴慧;曾秋梅;任梦菲;顾宜淳;;从图形处理器到基于GPU的通用计算[J];武汉大学学报(理学版);2013年02期
2 王海峰;陈庆奎;;图形处理器通用计算关键技术研究综述[J];计算机学报;2013年04期
3 马千里;徐华勋;岳凯;李思昆;;基于GPU的非结构化网格数据体光照计算与实现方法[J];计算机工程与科学;2011年01期
4 牟胜梅;杨晓东;;高吞吐率浮点FFT处理器的FPGA实现研究[J];计算机工程与科学;2008年07期
【共引文献】
相关期刊论文 前10条
1 唐坤杰;董树锋;宋永华;;基于不完全LU分解预处理迭代法的电力系统潮流算法[J];中国电机工程学报;2017年S1期
2 晏敏;何欣;李沙;祝龙;赵丽;;基于一阶泰勒级数查表法单精度倒数的设计与实现[J];计算机工程与科学;2017年07期
3 李n,
本文编号:2279241
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2279241.html