片上网络结构设计与性能分析关键技术研究

发布时间：2018-03-06 05:24

本文选题：片上网络　切入点：虫孔交换　出处：《国防科学技术大学》2015年博士论文　论文类型：学位论文

【摘要】：随着半导体技术的不断进步,单个芯片上能够集成的处理单元越来越多。片上网络作为实现这些处理单元间互连的基础通信架构受到了学术界和工业界的广泛关注。硬件成本和通信性能是评价片上网络可用性的两个非常重要的指标,如何以较低的硬件成本设计出能够满足实际应用需求的片上网络结构是该领域研究的核心问题。本文针对片上网络结构设计与性能分析中几个关键问题展开研究,主要的研究工作如下:(1)支持端口间缓存和虚通道共享的路由器微体系结构研究虚通道虫孔交换片上网络的性能、功耗和硬件成本都会受到虚通道数量、缓存容量和缓存组织方式的影响。为了在不引入较大的硬件开销和功耗负担的前提下最大化片上网络的性能,需要尽可能地提高路由器缓存资源的利用率。这就要求对缓存资源进行动态的分配和管理,以适应片上网络流量的动态变化。现有的方案基本都是围绕着如何在路由器的端口之间或者端口内的虚通道之间实现缓存资源的共享,忽略了不同端口对虚通道数量的不同需求以及需求的动态变化。为此,本文提出了一种支持端口间自适应虚通道共享(Adaptive Virtual Channel Sharing,AVCS)的路由器微体系结构,该结构可以在运行时根据网络流量的变化动态地调整每个路由器端口所能使用的缓存容量和虚通道数量。首先,本文分析了经典路由器存在的主要问题,并针对这些问题提出了AVCS路由器的基本架构。然后,本文面向AVCS路由器提出一个低成本的共享资源分配算法,该算法可以在运行时将共享缓存资源按需分配给各个端口使用。最后,本文又面向AVCS路由器提出了一个虚通道和交换机分配器请求端口共享方案,该方案让每个端口的私有虚通道与对应的共享虚通道复用同一个虚通道和交换机分配器的请求端口,从而显著地降低了分配器的规模和硬件成本。本文提出的AVCS路由器具有缓存资源利用率高、硬件开销小和平均延迟低等优点。实验结果显示,与具有同样性能的经典路由器结构相比,AVCS路由器可以降低32.1%的功耗并节省11.7%的芯片面积。(2)基于实时演算的片上网络端到端延迟上界分析方法研究在基于片上网络的多核处理器系统上部署实时应用程序之前,必须要保证每条数据流在最差情况下的延迟上界都不违背其时限约束。针对这一需求,学术界已经提出了基于确定性网络演算的分析方法、基于流的分析方法和基于链路的分析方法来分析数据流的端到端延迟上界。然而,基于流和基于链路的分析方法都只能应用于路由器的缓存容量足够大的情况。基于网络演算的分析方法虽然没有对缓存容量做任何假设,但是获得的延迟上界比较悲观,需要进一步地改进。为此,本文提出一个基于实时演算的端到端延迟分析算法来克服基于流和基于链路的延迟分析方法的应用限制,并进一步改进确定性网络演算的分析结果。首先,本文提出一个流量模型变换定理用于将切片级实时演算到达曲线变换成报文级实时演算到达曲线,该定理使得对报文级端到端延迟上界的分析成为可能;然后,本文为虫孔交换片上网络建立了一个实时服务曲线模型,并利用极小加代数的有关性质推导出了信约流量控制器的等效实时服务曲线;最后,本文基于前面建立的报文级实时到达曲线模型和路由器的实时服务曲线模型提出了一个端到端延迟分析算法。与已有的方法相比,该算法同时支持固定优先级抢占调度和轮询调度,对于缓存容量有限的片上网络也能给出正确的分析结果,并且获得的延迟上界比已有的方法更加紧致。(3)基于实时演算的片上网络缓存分配算法研究路由器的缓存容量对整个片上网络的性能、功耗和硬件成本都有巨大的影响。为了降低基于优先级的虫孔交换片上网络的硬件成本,学术界已经提出了虚通道共享方案和基于链路的缓存分配算法。然而,虚通道共享方案会严重影响网络的通信性能,而且在特定的路由策略下还会引起死锁。基于链路的缓存分配算法虽然可以保证网络的通信性能,但是缓存分配结果过于保守。为此,本文提出一种基于实时演算的片上网络缓存分配算法来降低基于优先级的虫孔交换片上网络的硬件成本。该算法在保证时限约束的前提下,按照优先级顺序优化每条数据流在其经过的路由器上预留的缓存资源。首先,本文基于实时演算理论给出了每条数据流在其经过的每个路由器上都不触发流量控制的一个充分条件,该条件可以用于确定整个缓存优化算法的迭代初始值;然后,本文给出了一个缓存分配过程来减少路由器为每条数据流预留的缓存资源,并结合前面给出的迭代初值定理提出了一个可以确保最差情况下通信性能的缓存分配算法。与已有的缓存分配算法相比,本文提出的算法可以显著地降低路由器的硬件开销,极大地降低路由器的硬件成本、功耗和芯片面积。(4)低延迟路径选择算法及其快速验证技术研究在虫孔交换片上网络中,每条数据流的端到端延迟都会受到网络中其它数据流的传输路径和流量特性的影响。因此,在为高优先级数据流确定传输路径时应当尽量选择那些对低优先级数据流影响较小的链路,以尽可能的优化每一条数据流的端到端延迟。在确定了所有数据流的传输路径之后,还需要一种有效的延迟分析方法来快速地确定是否每条数据流的延迟约束都得到了满足。如果有数据流违背延迟约束,则应该尝试为其选择一条新的传输路径。针对上述需求,本文提出了一种面向Mesh网络的低延迟路径选择算法来优化每条数据流的端到端延迟。首先,该算法根据每条链路对低优先级数据流的重要性程度为其赋予不同的权重。在确定链路的权重时,该算法充分利用了Mesh网络的基本性质和组合数学的相关方法,大幅度地降低了已有方法的计算复杂度和存储开销。然后,该算法利用Dijkstra算法为每个数据流选择一条合适的传输路径。最后,该算法又利用之前提出的基于实时演算的延迟分析算法对每条数据流的延迟约束进行检查。为了加快延迟约束的检查速度,本文还并针对延迟分析算法的性能优化提出了若干建议。实验结果表明,本文提出的路径选择算法可以显著地降低数据流的延迟上界。综上所述,本文针对当前片上网络领域面临的几个重要问题进行了研究,对推动片上网络在片上多处理器系统中的更广泛的应用具有一定的理论贡献和应用价值。
[Abstract]:With the development of semiconductor technology, a single chip can be integrated with the processing unit. More and more network on chip communication architecture as the foundation of realizing interconnection between the processing units has attracted wide attention in both academia and industry. The cost of hardware and communication performance evaluation are two very important indicators on the availability of the network, how to with a lower cost of hardware design can meet the practical needs of the on-chip network structure is a key problem in the research field. In this paper several key issues of structural design and performance analysis of network on chip in the study, the main research work is as follows: (1) the support router microarchitecture of virtual channel buffer and worm virtual channel sharing inter port hole exchange network on chip performance, power consumption and hardware cost are affected by the effect of virtual channel number, cache capacity and cache organization. For The performance of the network maximum on the premise of not introducing hardware overhead and power consumption of the larger burden, as far as possible need to improve the utilization rate of resources. This router cache requires allocation and dynamic management of cache resources, in order to adapt to the dynamic network traffic change on the existing schemes are basically. Around how to realize the sharing of cache resources between the router ports or ports within the virtual channel, ignore the dynamic changes of different needs of different number of ports on the virtual channels and demand. Therefore, this paper proposes an adaptive virtual channel sharing support between ports (Adaptive Virtual Channel Sharing, AVCS) router micro system the structure, cache capacity and virtual channel number of the structure can be run in accordance with the dynamic change of network traffic to adjust each router port to use. Firstly, this paper Analysis of the main problems of classical routers exist, and proposes the basic framework of AVCS router to solve these problems. Then, this paper proposes AVCS oriented router shared resource allocation algorithm for a low cost, the algorithm runtime shared cache resource allocation on demand for each port can be used again. Finally, this paper proposes a virtual channel and distributor switch request port sharing scheme for AVCS router, the virtual channel multiplexing scheme for sharing request with a virtual channel switch and distributor end private virtual channel for each port and the corresponding port, thereby reducing the size and cost of hardware distributor significantly. AVCS router is proposed in this paper has the cache resource utilization high, low hardware overhead and average delay etc.. Experimental results show that compared with the classical router structure has the same performance, AVCS router 32.1% can reduce power consumption and save 11.7% of the chip area. (2) to end delay method based on real-time applications before deployment of multi-core processor system on chip network based on real-time network analysis of upper bound on end, must ensure that each data flow delay bound in the worst case are not contrary to the time constraint. In response to this demand, the academic circle has put forward analysis method of deterministic network calculus based on flow analysis method and analysis method based on link analysis to data stream based on end-to-end delay bound. However, the cache capacity and flow analysis method based on link can only be applied to the router's big enough based on the analysis method based on network calculus. Although not make any assumptions on the cache capacity, but the delay bound is pessimistic, need further improvement. Therefore, this paper provided A real-time based end to end delay analysis algorithm to overcome the flow based and application based on the limit analysis method of link delay, analysis results and further improved deterministic network calculus. Firstly, this paper proposes a traffic model transformation theorem for the slice real-time arrival curve will transform into the message level real-time arrival curve the theorem of the message level, the end-to-end delay bound analysis becomes possible; then, the on-chip network established a real-time service model for wormhole switching curve, and using the minimum plus algebra is equivalent to the real-time service curve off nature derived letter about flow controller; finally, this paper established the front message level real-time arrival of real-time service curve model and the router presents an end-to-end delay analysis based algorithm. Compared with the existing methods, the proposed algorithm At the same time support fixed priority preemptive scheduling and scheduling, can give the correct results for the network on-chip cache capacity is limited, and the delay bound is more compact than the existing methods. (3) the performance of the cache capacity real-time network on chip cache allocation algorithm of the router based on the network. Have a great impact. In order to reduce the hardware cost and power consumption based on wormhole priority exchange network on chip hardware cost, the academic circle has put forward the virtual channel sharing scheme based on cache allocation algorithm and link. However, the communication performance of virtual channel sharing scheme will seriously affect the network, but also in particular the routing strategy cause a deadlock. Although the cache allocation algorithm can guarantee the communication link performance based on network, but the cache allocation is too conservative. Therefore, this paper proposes a Real-time network on chip cache allocation algorithm based on priority reduction based on wormhole exchange network on chip hardware cost. The algorithm not only guarantees the time constraint, according to the priority sequence optimization of each data flow through the router in its reserved cache resources. Firstly, the real-time theory gives each data flow at each router by the trigger is not a sufficient condition based on flow control, the conditions can be used to determine the initial value of iteration cache optimization algorithm; then, this paper presents a cache allocation process to reduce the router flow cache resource reservation for each data, and combined with the iterative initial value theorem given above propose a cache allocation algorithm can ensure the communication performance in the worst case. Compared with the existing cache allocation algorithm, this algorithm can Reduce the router hardware overhead significantly, greatly reduce the hardware cost of the router, the power consumption and chip area. (4) low delay path selection algorithm and fast verification technology research on wormhole switching network on chip, effects of each data flow end-to-end delay will be the transmission path and flow characteristics of other data flow network in. Therefore, in determining the transmission path for high priority data stream should try to choose those flow smaller link to the low priority data, with each data flow optimization possible end-to-end delay. After all the data transmission path was established, also need an effective delay the analysis method to quickly determine whether the delay constraint of each data flow are satisfied. If the data stream violates the delay constraint, you should try to choose a new transmission path for its on. The demand, this paper proposes a Mesh based low delay path selection algorithm to optimize each data flow end-to-end delay. Firstly, the algorithm according to the degree of importance of each link of the low priority data stream is given different weights. In the process of determining the weight of link, the algorithm makes full use of related methods the basic properties of combinatorial mathematics and Mesh network, greatly reduces the computational complexity of the existing methods and storage overhead. Then, the algorithm uses Dijkstra algorithm to select a suitable transmission path for each data stream. Finally, put forward the algorithm and use before real-time delay delay constraint algorithm for each data analysis flow check. Based on the delay constraint in order to speed up the examination speed, this paper aiming at the delay performance analysis optimization algorithm and put forward some suggestions. The experimental results show that the This paper puts forward the path selection algorithm can significantly reduce the delay bound in data streams. In summary, this paper studies several important problems on the network in the field of film, to promote the network on chip on chip in a multi processor system is more widely used and has certain theoretical contribution and practical value.

【学位授予单位】：国防科学技术大学
【学位级别】：博士
【学位授予年份】：2015
【分类号】：TN47

【相似文献】