适用于空间通信的LDPC码GPU高速译码架构

发布时间：2018-03-05 15:30

本文选题：低密度奇偶校验码　切入点：图形处理器　出处：《航空学报》2017年01期 　论文类型：期刊论文

【摘要】：鉴于目前空间通信对高速、可重配置信道译码器的需求,利用图形处理器(GPU)的并行化运算特点,提出了一种低密度奇偶校验(LDPC)码软件高速译码架构。通过优化Turbo消息传递译码(TDMP)算法节点更新运算线程块内和块间并行度、减少非规则行重造成的线程分支、降低线程对节点更新信息存储资源的访问延时以及合理量化译码器存储信息来提升译码内核函数的执行效率。并在此基础上引入异步统一计算设备构架(CUDA)流处理机制,设计优化的译码器输入输出数据传输和内核函数之间的执行调度方式以及CUDA流上的译码线程资源配置方式,最大化译码吞吐率的同时降低译码延时。在Nvidia最新的Tesla K20和GTX980平台上对国际空间数据系统咨询委员会(CCSDS)遥测标准LDPC码进行的TDMP译码实验结果表明,本架构进行10次迭代译码的吞吐率最高可达约500 Mbps,平均译码延时约为2ms左右。与现有结果相比,本架构在保持软件架构配置灵活性的同时更加有效的兼顾了译码吞吐率和延时性能。
[Abstract]:In view of the demand of space communication for high-speed and reconfigurable channel decoder, the parallel computing characteristics of GPU are utilized. In this paper, a high speed decoding architecture for low density parity check (LDPC) codes is proposed. By optimizing the Turbo message passing decoding (Turbo) algorithm, the parallelism between and within the operation thread blocks is updated to reduce the thread branch caused by irregular row replay. In order to improve the execution efficiency of decoding kernel functions, the thread access delay to node update information storage resources and the reasonable quantization of decoder storage information are reduced. On this basis, an asynchronous unified computing device architecture (CUDAA) stream processing mechanism is introduced. The optimized implementation scheduling between input and output data transmission and kernel functions and the configuration of decoding thread resources on the CUDA stream are designed. The experimental results of TDMP decoding on the latest Tesla K20 and GTX980 platforms of the International Space data Systems Advisory Committee (ISCS) telemetry standard LDPC codes show that the decoding time delay is reduced while the decoding throughput is maximized. The maximum throughput of 10 iterations is about 500 Mbpss, and the average decoding delay is about 2 Ms. Compared with the existing results, This architecture not only keeps the flexibility of software architecture configuration, but also takes into account the decoding throughput and delay performance more effectively.
【作者单位】：北京航空航天大学电子信息工程学院;
【基金】：国家自然科学基金(91438116)~~
【分类号】：V443.1;TN911.22

【相似文献】