基于不同粒度故障模型的NoC容错路由算法研究

发布时间：2018-10-16 14:36

【摘要】：容错技术是片上网络(Network on Chip，NoC)提供可靠性数据通信的关键技术之一，是影响网络吞吐量和延迟性能的重要因素。容错路由算法是NoC容错技术中容忍永久性故障的主要方法。研究容错路由算法必须基于一定的故障模型，2D Mesh NoC容错路由算法采用的故障模型按照故障粒度的大小可以分为区域故障模型、单故障模型和细粒度故障模型。然而，当前针对不同粒度故障模型NoC容错路由算法的研究还存在一些不足，主要体现在以下三个方面：（1）单故障模型容错路由算法采用“一步一比较”的数据传输方式，导致整个网络的平均延迟较高；（2）现有细粒度故障模型处理链路故障的方式过于粗犷，使得节点内的很多有效资源无法得到充分的利用；（3）细粒度故障模型容错路由算法采用传统的“避绕容错”方式处理通道故障，导致数据的路由路径过长，整个NoC的网络性能较低。本文针对现有不同粒度故障模型中容错路由算法存在的不足，结合NoC与传统计算机网络的不同特性，研究了2D Mesh NoC单故障模型和细粒度故障模型中的自适应容错路由技术。论文的主要研究工作如下： 1、提出了一种基于单故障模型的低延迟自适应NoC容错路由算法。算法摒弃传统基于单故障模型容错路由算法“一步一比较”的传输方式，结合2D Mesh结构的特点，采用“跳步比较”的方式来传输数据包，同时算法使用一种易于采用简单逻辑电路实现的模糊优先级策略来保证自适应性。实验结果表明，该算法是一种适用于大规模NoC的低延迟容错路由算法；ISE综合结果显示，实现该算法的硬件成本较低。 2、建立了一种基于细粒度故障模型改进的微粒度故障模型。针对当前细粒度故障模型将链路故障等效处理所带来的有效资源浪费问题，微粒度故障模型在记录的故障类型中增加链路故障，以便能有效区分模型中的链路故障和通道故障；模型在描述的故障类型中增加缓存故障，使得再利用NoC中被丢弃的缓存资源成为可能。本文以可用资源率作为性能指标对比了微粒度故障模型和两种常用的细粒度故障模型，并分析了基于三种故障模型的系统级硬件开销情况，结果显示，微粒度故障模型是一种“性价比”较高的故障模型。 3、提出了一种基于缓存再利用的微粒度故障模型NoC容错路由算法。算法在本文建立的微粒度故障模型的基础上，，采用“通道替代容错”来代替传统的“通道避绕容错”；“通道替代容错”使用节点内因为链路故障而被丢弃的正常通道来替代故障通道，提高了NoC缓存资源利用率和采用最优端口输出数据包的概率，增加了整个NoC用于“正常通信”的路由路径的数量，从而有效提升了NoC的网络性能。实验和ISE综合结果表明，该算法是一种适用于高故障率NoC，具有高可靠性和高性价比特点的容错路由算法。
[Abstract]:Fault-tolerant technology is one of the key technologies to provide reliable data communication for on-chip network (Network on Chip,NoC). It is an important factor affecting network throughput and delay performance. Fault-tolerant routing algorithm is the main method to tolerate permanent fault in NoC fault-tolerant technology. The fault-tolerant routing algorithm must be based on a certain fault model. The fault model used in 2D Mesh NoC fault-tolerant routing algorithm can be divided into regional fault model, single-fault model and fine-grained fault model according to the size of fault granularity. However, there are still some shortcomings in the research of fault-tolerant routing algorithm for different granularity fault models NoC, which are mainly reflected in the following three aspects: (1) the single-fault model fault-tolerant routing algorithm adopts the "one-step, one-comparison" data transmission mode. As a result, the average delay of the whole network is high; (2) the existing fine-grained fault model is too rough to deal with link failures, which makes many effective resources in the nodes unable to be fully utilized; (3) the fault-tolerant routing algorithm of fine-grained fault model adopts the traditional method of "avoiding fault tolerance" to deal with the channel fault, which leads to the long route path of the data and the low performance of the whole NoC network. In this paper, the adaptive fault-tolerant routing techniques in 2D Mesh NoC single fault models and fine-grained fault models are studied according to the shortcomings of existing fault-tolerant routing algorithms in different granularity fault models and combined with the different characteristics of NoC and traditional computer networks. The main work of this paper is as follows: 1. A low delay adaptive NoC fault-tolerant routing algorithm based on single fault model is proposed. The algorithm abandons the traditional fault-tolerant routing algorithm based on single fault model "one step, one comparison" transmission mode, combines the characteristic of 2D Mesh structure, and adopts the method of "step comparison" to transmit data packet. At the same time, the algorithm uses a fuzzy priority strategy, which is easy to be implemented by simple logic circuits, to ensure self-adaptability. Experimental results show that the algorithm is a low delay fault-tolerant routing algorithm for large-scale NoC, and the ISE synthesis results show that, The hardware cost of the algorithm is low. 2. An improved micro-granularity fault model based on fine-grained fault model is established. Aiming at the problem of waste of effective resources caused by the current fine-grained fault model equivalent treatment of link fault, the particle size fault model adds link fault to the recorded fault type in order to effectively distinguish link fault and channel fault in the model. The model adds cache faults to the types of faults described, which makes it possible to reuse the discarded cache resources in NoC. In this paper, the available resource rate is used as the performance index to compare the particle size fault model with the two commonly used fine-grained fault models, and the hardware overhead of the system level based on the three fault models is analyzed. The results show that, Particle size fault model is a high performance-price ratio fault model. 3. A fault tolerant routing algorithm based on cache reuse for microgranular fault model (NoC) is proposed. On the basis of the micro-granularity fault model established in this paper, the algorithm uses "channel instead of fault tolerance" to replace the traditional "channel fault tolerance". "Channel substitution fault tolerance" uses the normal channel discarded in the node due to link failure to replace the fault channel, which improves the utilization of NoC cache resources and the probability of using the optimal port to output data packets. The number of routing paths for "normal communication" of the whole NoC is increased, which effectively improves the network performance of NoC. Experimental results and ISE synthesis results show that the proposed algorithm is a fault-tolerant routing algorithm with high reliability and high performance-to-price ratio for NoC, with high failure rate.
【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TN47;TN915.02;TP302.8

【参考文献】