同构通用流多核处理器存储部件关键技术研究
发布时间:2018-10-18 19:22
【摘要】:人们对处理器不断增长的应用需求促进处理器体系结构的不断发展,也促使新型处理器体系结构的诞生。多核流处理器是针对流式数据处理和流应用的新型多核处理器,以数量众多的简单核构成。其对于计算密集型应用,数据吞吐率大,,资源利用率高,但是对于访存密集型和稀疏类应用性能较差。传统多核结构适用于访存密集型和稀疏类应用,然而对于流应用,其Cache结构不能高效捕获流应用的数据局部性。为了满足流应用与传统多核应用的综合需求,为了实现多核流处理器与传统多核处理器的融合,我们提出了同构通用流多核处理器体系结构:片内集成多个同构的流多核,流多核可根据具体应用配置为传统多核或流多核的一部分。传统多核与流多核主要的区别在于访存部件,前者是以Cache结构为主的片上缓存结构,后者则是由寄存器文件和片上便签存储器构成。通过配置流多核内部的共享的片上存储资源,调节便签存储器和Cache结构所占的比例,实现同构通用流多核处理器对多种应用需求的适用性。其中Cache结构针对传统多核的应用,解决其数据上的时间和空间局部性,便签存储器主要捕捉流应用中数据的生产者-消费者局部性。 本课题对流多核体系结构访存部件关键技术进行了深入研究,主要工作和创新点包括: 1、提出了一种可配置的片上共享SPM/L2Cache结构。同构通用流处理器的应用范围包括传统应用和流应用,其基本组成单元流多核面向不同应用时可分别按片上SMP执行模式和SIMT执行模式运行。在不同的运行模式下对片上共享存储结构进行合理配置,以满足处理器对存储部件的需求。 2、设计了针对流多核片上缓存结构特点的数据一致性维护协议。流多核一级私有Cache是写穿透策略,二级共享Cache的写策略是写回,在此基础上,通过作废被修改的Cacheline的拷贝来维护两级缓存之间数据一致性。 3、设计了流核心私有的一级数据缓存。在Microblaze软核Cache模块的基础上,通过数据宽度64位扩展和增加支持一致性维护的逻辑电路,完成了流多核架构中的最内层缓存结构的设计。 4、基于Xilinx公司的软件开发平台下,对流多核存储部件的关键逻辑设计进行了行为仿真,并进行了一定的性能分析。验证结果显示所有设计均实现了预定的功能,同时性能分析显示了本文设计的有效性。
[Abstract]:The increasing demand for processor applications promotes the development of processor architecture and the birth of new processor architecture. Multi-core stream processor is a new type of multi-core processor for streaming data processing and streaming applications, which consists of a large number of simple cores. It has high data throughput and high resource utilization for computationally intensive applications, but it has poor performance for memory access intensive and sparse applications. Traditional multicore architecture is suitable for memory access intensive and sparse class applications. However, for stream applications, the Cache structure can not capture the data localization of stream applications efficiently. In order to meet the integrated requirements of streaming applications and traditional multi-core applications, and to integrate multi-core stream processors with traditional multi-core processors, we propose an isomorphic universal stream multi-core processor architecture, in which multiple isomorphic streams and multi-cores are integrated on a chip. Stream multicore can be configured as part of traditional multicore or stream multicore according to specific application. The main difference between traditional multi-core and streaming multi-core is memory access. The former is based on Cache structure and the latter is composed of register file and on-chip note memory. By configuring the shared on-chip storage resources within the stream multi-core and adjusting the proportion of the note memory and the Cache structure, the applicability of the isomorphic universal stream multi-core processor to various application requirements is realized. The Cache structure solves the temporal and spatial localization of the data for the traditional multi-core applications, and the note memory mainly captures the producer-consumer locality of the data in the stream application. In this paper, the key technologies of memory access components in convection multicore architecture are deeply studied. The main work and innovations are as follows: 1. A configurable shared SPM/L2Cache architecture is proposed. The application scope of isomorphic general flow processor includes traditional application and stream application. Its basic component, cell stream multi-core, can be run according to on-chip SMP execution mode and SIMT execution mode respectively when it is oriented to different applications. In order to meet the memory requirements of the processor, a data consistency maintenance protocol is designed for the characteristics of streaming multi-core on-chip buffer structure. Stream multi-core primary private Cache is write penetration strategy, secondary shared Cache write strategy is write back, on this basis, The data consistency between the two levels of cache is maintained by canceling the modified copy of Cacheline. 3. The primary data cache which is private to the stream core is designed. On the basis of Microblaze soft core Cache module, through the data width of 64-bit expansion and add support for consistency maintenance of the logic circuit, The design of the innermost buffer structure in the stream multi-core architecture is completed. 4. Based on the software development platform of Xilinx, the behavior simulation of the key logic design of the convection multi-core storage unit is carried out, and the performance analysis is given. The verification results show that all the designs achieve the intended function, and the performance analysis shows the effectiveness of the design.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332
本文编号:2280127
[Abstract]:The increasing demand for processor applications promotes the development of processor architecture and the birth of new processor architecture. Multi-core stream processor is a new type of multi-core processor for streaming data processing and streaming applications, which consists of a large number of simple cores. It has high data throughput and high resource utilization for computationally intensive applications, but it has poor performance for memory access intensive and sparse applications. Traditional multicore architecture is suitable for memory access intensive and sparse class applications. However, for stream applications, the Cache structure can not capture the data localization of stream applications efficiently. In order to meet the integrated requirements of streaming applications and traditional multi-core applications, and to integrate multi-core stream processors with traditional multi-core processors, we propose an isomorphic universal stream multi-core processor architecture, in which multiple isomorphic streams and multi-cores are integrated on a chip. Stream multicore can be configured as part of traditional multicore or stream multicore according to specific application. The main difference between traditional multi-core and streaming multi-core is memory access. The former is based on Cache structure and the latter is composed of register file and on-chip note memory. By configuring the shared on-chip storage resources within the stream multi-core and adjusting the proportion of the note memory and the Cache structure, the applicability of the isomorphic universal stream multi-core processor to various application requirements is realized. The Cache structure solves the temporal and spatial localization of the data for the traditional multi-core applications, and the note memory mainly captures the producer-consumer locality of the data in the stream application. In this paper, the key technologies of memory access components in convection multicore architecture are deeply studied. The main work and innovations are as follows: 1. A configurable shared SPM/L2Cache architecture is proposed. The application scope of isomorphic general flow processor includes traditional application and stream application. Its basic component, cell stream multi-core, can be run according to on-chip SMP execution mode and SIMT execution mode respectively when it is oriented to different applications. In order to meet the memory requirements of the processor, a data consistency maintenance protocol is designed for the characteristics of streaming multi-core on-chip buffer structure. Stream multi-core primary private Cache is write penetration strategy, secondary shared Cache write strategy is write back, on this basis, The data consistency between the two levels of cache is maintained by canceling the modified copy of Cacheline. 3. The primary data cache which is private to the stream core is designed. On the basis of Microblaze soft core Cache module, through the data width of 64-bit expansion and add support for consistency maintenance of the logic circuit, The design of the innermost buffer structure in the stream multi-core architecture is completed. 4. Based on the software development platform of Xilinx, the behavior simulation of the key logic design of the convection multi-core storage unit is carried out, and the performance analysis is given. The verification results show that all the designs achieve the intended function, and the performance analysis shows the effectiveness of the design.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332
【参考文献】
相关期刊论文 前7条
1 邓让钰;陈海燕;窦强;徐炜遐;谢伦国;戴泽福;李永进;夏军;罗莉;张民选;;一种异构多核处理器的并行流存储结构[J];电子学报;2009年02期
2 林宏;多处理器系统Cache一致性协议的探讨[J];闽江学院学报;2004年02期
3 王光;沈绪榜;;多媒体流处理器中缓冲器的体系结构设计[J];北京航空航天大学学报;2006年01期
4 袁爱东,董建萍;基于目录的一致性协议浅析[J];计算机工程;2004年S1期
5 潘国腾;窦强;谢伦国;;基于目录的Cache一致性协议的可扩展性研究[J];计算机工程与科学;2008年06期
6 林一松;杨学军;唐滔;王桂彬;徐新海;;一种基于并行度分析模型的GPU功耗优化技术[J];计算机学报;2011年04期
7 薛燕,樊晓桠,李瑛;多处理机系统中数据Cache的一种优化设计[J];微电子学与计算机;2004年12期
本文编号:2280127
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2280127.html