基于可重构平台的片上多处理器系统相关技术研究

发布时间：2018-04-23 14:01

本文选题：片上多处理器 + 双模式融合通信　；参考：《东北大学》2013年博士论文

【摘要】：传统单核处理器受到功耗及制造工艺的限制,已无法通过提升主频来满足高性能嵌入式应用的需求。因此,学者们提出了片上多处理器系统的研究方向。与多处理器系统相比,片上多处理器系统将处理单元集成在单颗芯片中,减少了通信代价,降低了功耗,进一步提升了系统的整体性能。因此,片上多处理器系统是未来计算机发展的方向和必然趋势。随着研究的不断深入,越来越多的应用被映射到片上多处理器系统中,然而此过程所遇到的一些问题冲击了现有的系统架构,此类问题的核心是如何保证和提高系统的并行效率。为此,本文从通信机制、设计模型、路由算法、拓扑结构等关键领域,展开了深入的研究,并取得了如下创新性成果：(1)提出了一种双模式融合的通信机制。处理器间的通信机制是影响片上多处理器系统性能的关键因素,针对已有通信机制存在通信效率低的问题,提出了一种双模式融合通信机制。该机制根据处理器间交互数据的特征,将其划分为控制类消息和数据类消息,分别采用独立的通道完成传输。基于双模式融合通信机制,提出了复制-分治的任务并行化模型,通过预先对任务复制,减少运行时处理器问的调度开销。基于可重构平台,对双模式融合通信机制进行了实现,并以粒子滤波跟踪算法为例,进行了任务并行化设计。测试结果表明,双模式融合通信机制能够显著提升处理器间的数据交互能力,降低并行开销,提高系统整体的并行效率。(2)提出了一种多层次并行的设计模型。根据应用需求设计合理的系统架构及任务调度方式,是提高异构片上多处理器系统性能的关键。已有的设计模型虽然可以提高系统的并行性,但仍然没有摆脱宏观串行、局部并行的模式。针对以上问题,提出了一种多层次并行的设计模型。将异构系统的设计分解为系统级、事务级和语句级三个层次,通过逐层深入、逐步分解的方式挖掘任务的并行性,提高系统整体性能。以多层次并行模型为基础,基于可重构平台,设计并实现了AVI视频编码及存储系统。测试结果表明,多层次并行模型有效的解决了异构片上多处理器系统的设计问题,提高了系统并行效率。(3)提出了一种基于阻塞感知的局部自适应路由算法。已有路由算法对拓扑网络利用率低,数据包路由过程容易产生局部阻塞,针对此问题,提出了一种基于阻塞感知的局部白适应路由算法。该路由算法采取全局维序、局部自适应的规则,在路由节点间增加阻塞反馈信号,对邻近区域的网络状态进行监控,并能够根据实际情况动态调整路由路径。理论分析及仿真结果表明：该算法具有较高的数据吞吐率和较强的自适应能力。基于可重构平台,对本文提出的算法和XY路由算法进行了实现。对比测试表明,采用本文所提出的算法进行路由时,有多条最短路径可以选择,降低了单一链路的负载。同时,当网络出现阻塞时,可有效的绕过阻塞区域,提高系统的并行性。(4)提出了一种基于折半思想的拓扑结构。NoC型片上多处理器系统中,主节点与其它节点的数据交互频率要远高于普通节点间的交互频率,而目前的拓扑结构研究并没有面向这一特征进行优化设计。针对此问题,提出一种新型的拓扑结构Half-Mesh。该拓扑通过增加行、列头节点与普通节点间横向、纵向长连线,缩短了头节点与同维的中心节点间距离,继而减小了整个NoC网络的平均路径长度。针对Half-Mesh拓扑结构,提出了HTF-XY路由算法,采取分区路由策略,既缩短了不同区域内节点间的路径长度,又提升路由的自适应性。基于可重构平台,实现了网络规模为7×7的Half-Mesh拓扑结构及HTF-XY路由算法。测试结果表明,Half-Mesh拓扑结构提升了头节点与其它节点的交互能力,降低了整个片上网络的路由延迟,提高系统的并行性。
[Abstract]:The traditional mononuclear processor is limited by power and manufacturing technology. It has not been able to meet the needs of high performance embedded applications by lifting the main frequency. Therefore, scholars have proposed the research direction of the on-chip multiprocessor system. Compared with the multiprocessor system, the chip multiprocessor system integrates processing units in single chips and reduces the pass. It reduces power consumption and further improves the overall performance of the system. Therefore, the on-chip multiprocessor system is the direction and inevitable trend of future computer development. As the research continues, more and more applications are mapped to on chip multiprocessor systems. However, some of the problems encountered in this process have impacted the existing systems. The core of such problems is how to ensure and improve the parallel efficiency of the system. Therefore, this paper has carried out an in-depth study on the key fields such as communication mechanism, design model, routing algorithm, topology structure and other key fields, and obtained the following innovative achievements: (1) a communication mechanism of dual mode fusion is proposed. The communication mechanism among processors is the mechanism of communication between the processors. The key factor affecting the performance of the multiprocessor system on the chip is a dual mode fusion communication mechanism, which is based on the characteristics of the interactive data between processors. This mechanism divides it into a control class message and a data class message according to the characteristics of the interactive data between the processors. A task parallelization model of duplication and division is proposed in the mode fusion mechanism. By copying the tasks in advance, the scheduling overhead of the processor is reduced. Based on the reconfigurable platform, the dual mode fusion communication mechanism is implemented. The task parallelization design is carried out with the particle filter tracking algorithm. The test results show that The dual mode fusion communication mechanism can significantly improve the data interaction capability between processors, reduce the parallel overhead and improve the overall parallel efficiency of the system. (2) a multi level parallel design model is proposed. The design of a reasonable system architecture and task scheduling method based on the application requirements is the key to improving the performance of the heterogeneous multiprocessor system. Key. Although the existing design model can improve the parallelism of the system, it still does not get rid of the macro serial and local parallel mode. In view of the above problems, a multi level parallel design model is proposed. The design of the heterogeneous system is decomposed into three levels of system level, transaction level and statement level, which are gradually decomposed by layer by layer. Based on the multilevel parallel model and the reconfigurable platform, the AVI video coding and storage system is designed and implemented on the basis of the multilevel parallel model. The test results show that the multilevel parallel model effectively solves the design problem of the multiprocessor system on the heterogeneous chip and improves the efficiency of the system parallel. (3) proposed A local adaptive routing algorithm based on blocking perception is proposed. The existing routing algorithm has a low utilization rate to the topology network and easily produces local congestion in the packet routing process. A local white adaptive routing algorithm based on blocking perception is proposed for this problem. The routing algorithm takes the global order, local adaptive rules and routing. The congestion feedback signal is added between nodes to monitor the network state in the adjacent area, and the routing path can be dynamically adjusted according to the actual situation. The theoretical analysis and simulation results show that the algorithm has high data throughput and strong adaptive ability. Based on reconfigurable platform, the algorithm proposed in this paper and the XY routing algorithm are introduced. The comparison test shows that when the algorithm proposed in this paper is used for routing, there are several shortest paths that can be selected to reduce the load of a single link. At the same time, when the network is blocked, it can effectively bypass the blocking area and improve the parallelism of the system. (4) a topology structure.NoC type based on the half thought is proposed. In the processor system, the frequency of the data interaction between the main node and the other nodes is much higher than the interaction frequency between the common nodes, and the current topology research has not been optimized for this feature. A new topology, Half-Mesh., is proposed. Transverse and lengthwise long lines shorten the distance between the head node and the center node of the same dimension, and then reduce the average path length of the whole NoC network. In view of the Half-Mesh topology, the HTF-XY routing algorithm is proposed and the partition routing strategy is adopted, which not only shortens the path length among the nodes in different regions, but also improves the adaptability of the routing. The reconfigurable platform has realized the network size of 7 * 7 Half-Mesh topology and HTF-XY routing algorithm. The test results show that the Half-Mesh topology improves the interaction between the head node and other nodes, reduces the routing delay of the entire network and improves the parallelism of the system.

【学位授予单位】：东北大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】