新一代高速互连网络容错技术研究
发布时间:2018-10-29 10:44
【摘要】:在高性能计算机系统的研制中,多核处理器速度的不断提升使得处理器间数据传输的瓶颈问题越来越突出。高频传输技术可以带来传输性能的提升,但也减小了部件级固有可靠性;同时,高性能计算机系统规模的不断扩大、系统高速互连网络组成部件的不断增加,使得系统级固有可靠性进一步降低,采用高效简洁的容错技术提升系统的可用性成为当前高速互连网络研究的重要课题。自成功研制TH-1A高性能计算机系统互连网络之后,课题组针对高性能系统的进一步研制需求和国内外发展趋势,研究了新一代高速互连网络的多项关键技术。 基于以上背景,本文重点研究了高速互连网络容错技术,主要工作如下: 1、以当前主流的高性能计算机系统为实例,对互连网络的拓扑结构、路由算法、流控策略等技术进行了分析,总结了这些系统中所采用的相关容错技术。 2、面向新一代高速互连网络的路由器设计,提出了一种可支持自适应路由策略的微体系结构设计,具有支持多种拓扑结构与路由算法的特点。 3、面向新一代高速互连网络的网络结构设计,以3D-Torus和全相连网络混合互连的高维互连网络拓扑结构为目标,提出了基于通道排序和基于逃离通道的两种自适应容错路由算法。 4、分析了InfiniBand网络管理协议,对新一代高速互连网络实现拓扑发现、路由计算、路径分发的相关算法和网络管理协议进行了研究。 5、以OMNeT++平台为基础,,实现了相应的路由器结构、互连网络拓扑结构、容错路由算法以及子网管理协议,对系统进行行为级仿真和性能评价。
[Abstract]:In the development of high performance computer system, the bottleneck of data transmission between processors is becoming more and more prominent due to the increasing speed of multi-core processors. High-frequency transmission technology can improve the transmission performance, but also reduce the inherent reliability of the component level; At the same time, with the expansion of the scale of high performance computer system and the increasing of the components of high speed interconnection network, the inherent reliability of system level is further reduced. It is an important task to improve the availability of high-speed interconnection networks by using efficient and concise fault-tolerant techniques. Since the successful development of the TH-1A high performance computer system interconnection network, the research group has studied several key technologies of the new generation high speed interconnection network in view of the further development demand of the high performance system and the development trend at home and abroad. Based on the above background, this paper focuses on the fault-tolerant technology of high-speed interconnection network. The main work is as follows: 1. Taking the current mainstream high-performance computer system as an example, the topology of the interconnection network, routing algorithm, The flow control strategy and other technologies are analyzed, and the related fault tolerant techniques used in these systems are summarized. 2. For the router design of the new generation high-speed interconnection network, a micro-architecture design which can support adaptive routing strategy is proposed, which has the characteristics of supporting various topologies and routing algorithms. 3. The network structure design of the new generation high-speed interconnection network is aimed at the topology structure of the high-dimensional interconnect network, which is interlinked by 3D-Torus and all connected networks. Two adaptive fault-tolerant routing algorithms based on channel sorting and escape channel are proposed. 4. The InfiniBand network management protocol is analyzed, and the related algorithms and network management protocols for realizing topology discovery, routing calculation, path distribution in a new generation of high-speed interconnection networks are studied. 5. Based on the OMNeT platform, the corresponding router structure, interconnection network topology, fault-tolerant routing algorithm and subnet management protocol are implemented, and the behavior level simulation and performance evaluation of the system are carried out.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP302.8
本文编号:2297471
[Abstract]:In the development of high performance computer system, the bottleneck of data transmission between processors is becoming more and more prominent due to the increasing speed of multi-core processors. High-frequency transmission technology can improve the transmission performance, but also reduce the inherent reliability of the component level; At the same time, with the expansion of the scale of high performance computer system and the increasing of the components of high speed interconnection network, the inherent reliability of system level is further reduced. It is an important task to improve the availability of high-speed interconnection networks by using efficient and concise fault-tolerant techniques. Since the successful development of the TH-1A high performance computer system interconnection network, the research group has studied several key technologies of the new generation high speed interconnection network in view of the further development demand of the high performance system and the development trend at home and abroad. Based on the above background, this paper focuses on the fault-tolerant technology of high-speed interconnection network. The main work is as follows: 1. Taking the current mainstream high-performance computer system as an example, the topology of the interconnection network, routing algorithm, The flow control strategy and other technologies are analyzed, and the related fault tolerant techniques used in these systems are summarized. 2. For the router design of the new generation high-speed interconnection network, a micro-architecture design which can support adaptive routing strategy is proposed, which has the characteristics of supporting various topologies and routing algorithms. 3. The network structure design of the new generation high-speed interconnection network is aimed at the topology structure of the high-dimensional interconnect network, which is interlinked by 3D-Torus and all connected networks. Two adaptive fault-tolerant routing algorithms based on channel sorting and escape channel are proposed. 4. The InfiniBand network management protocol is analyzed, and the related algorithms and network management protocols for realizing topology discovery, routing calculation, path distribution in a new generation of high-speed interconnection networks are studied. 5. Based on the OMNeT platform, the corresponding router structure, interconnection network topology, fault-tolerant routing algorithm and subnet management protocol are implemented, and the behavior level simulation and performance evaluation of the system are carried out.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP302.8
【参考文献】
相关期刊论文 前2条
1 侯宗浩;董小社;郑守淇;黄泳翔;乔楠;;IBA的管理架构[J];计算机科学;2003年07期
2 曹志强,金红;4元N树InfiniBand网络的拓扑探查及管理[J];计算机工程;2004年S1期
本文编号:2297471
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2297471.html