多核处理器层次化存储体系研究
本文选题:多核处理器 + 嵌入式应用 ; 参考:《复旦大学》2012年硕士论文
【摘要】:近年来,以平板电脑、智能手机为代表的手持式消费电子产品获得了前所未有的快速发展机遇,随着产品的不断升级,不断提升的硬件配置水平带动功耗需求不断走高。处理器作为消费电子产品的核心部件,其技术需求特征逐渐从高性能转向高性能与低功耗并举。另一方面,随着工艺更新的步伐逐渐放缓,依靠提高时钟频率以获取性能增长的做法已经被证明不可持续,具有内在并行性与灵活性的多核架构已经成为处理器的主流架构。对于功耗敏感、种类繁多的嵌入式应用而言,多核处理器内在的并行处理能力、可扩展性和潜在的低功耗特征显得尤其适用。 本文旨在通过研究面向嵌入式应用的多核处理器的层次化存储体系,在已有的典型处理器存储架构设计方案的基础之上,提出了一种更为适用嵌入式多核处理器的存储架构。论文的研究目标是通过层次化存储架构的创新设计,统筹考虑嵌入式应用的高性能与低功耗需求,以满足嵌入式应用的技术需求特征。 论文的创新研究可以归纳为以下几点: (1)簇状结构层次化存储体系 本文提出了一类基于簇状结构的层次化存储体系。该存储体系针对嵌入式应用的需求特征,优化了存储体系中各层次的权重:通过扩展寄存器文件设计增加了数据局部性,通过缓存缺省设计降低了存储系统的硬件开销,通过私有与共享数据存储器的划分提升了数据局部性,增强了存储系统的层次性。 (2)扩展寄存器文件设计 在簇状结构层次化存储体系中,本文提出了兼容32位指令位宽的寄存器文件扩展方案,将寄存器的数目扩展了一倍达到64个,增强了数据的局部性,提升了处理器的整体性能。同时,本文创新地利用了扩展寄存器文件所提供的地址映射空间,改进并优化了消息传递核间通信机制,验证结果表明该方案可以使与核间通信相关的指令数目减少达50%,有效提升了核间通信效率。 (3)缓存缺省设计 在簇状结构层次化存储体系中,本文在处理器内部采用了缓存缺省设计方案,取而代之的为私有存储单元,节省了芯片面积并降低了系统的功耗开销。本文同时提出了基于私有存储单元的核间直接通信策略,通过对数据包头格式的指定,消息传递核间通信可以不需要处理器核的参与,进一步提升了核间通信效率以及处理器的运算效率。 (4)簇内共享存储单元 在簇状结构层次化存储体系中,本文设计了可以被簇内所有处理器节点共享的存储单元结构,并在该结构基础上提出了一种共享存储核间通信机制以及相应的信箱同步机制。通过将存储单元划分为私有存储单元与共享存储单元,数据的局部性得到提升,处理器访存延迟问题得到优化。 (5)芯片实现与应用实例 采用该簇状层次化存储体系的一款16核处理器采用TSMC65纳米低功耗CMOS制造工艺流程,芯片中包含两个簇单元,每个簇单元包含八个处理器单元与一个簇内共享存储器单元。处理器芯片面积为9.1mm2,其中单个处理器核面积为0.43mm2,在1.2V供电电压下最大时钟频率为750MHz。基于该多核处理器,我们实现了3780点快速傅里叶变换模块以评估层次化存储体系对性能的提升效果及实际的功耗水平。测试结果表明单个处理器核的典型功耗为34mW,显著低于其他同类型多核处理器。
[Abstract]:In recent years, handheld consumer electronic products, such as tablet computers and smartphones, have obtained unprecedented rapid development opportunities. With the continuous upgrading of products, the increasing hardware configuration level drives the power demand to be higher and higher. As the core component of the consumer electronic products, the technology demand features gradually from high sex. On the other hand, with the gradual slowdown in the pace of process updates, the practice of improving the clock frequency to gain performance has been proved unsustainable. The multi-core architecture with inherent parallelism and flexibility has become the main stream architecture of the processor. In terms of applications, multi-core processors are especially suitable for their parallel processing capability, scalability and low power consumption.
The purpose of this paper is to study the hierarchical storage system of multi core processors for embedded applications. On the basis of the existing design of typical processor storage architecture, a storage architecture which is more suitable for embedded multi-core processors is proposed. The research goal of this paper is to pass the innovative design of hierarchical storage architecture and take a comprehensive examination. Consider the high performance and low power requirements of embedded applications to meet the technical requirements of embedded applications.
The innovative research of this paper can be summarized as follows:
(1) hierarchical storage system of cluster structure
A hierarchical storage system based on cluster structure is proposed in this paper. This storage system optimizes the weight of all levels in the storage system according to the requirements of the embedded application. By extending the register file design, the data locality is increased, and the hardware overhead of the storage system is reduced by the default design of the cache. The division of shared data memory improves the locality of data and enhances the hierarchy of storage system.
(2) the design of the extended register file
In the hierarchical storage system of cluster structure, this paper proposes a register file extension scheme compatible with 32 bit instruction bit width, which extends the number of registers to 64, enhances the locality of the data and improves the overall performance of the processor. At the same time, this article innovally uses the address mapping provided by the extended register file. In addition, the communication mechanism of message transfer kernel is improved and optimized. The verification results show that the scheme can reduce the number of instructions related to inter nuclear communication by 50%, and effectively improves the efficiency of inter nuclear communication.
(3) cache default design
In the cluster structure hierarchical storage system, this paper uses the cache default design in the processor, instead of the private storage unit, saves the chip area and reduces the power consumption of the system. At the same time, this paper puts forward a direct connection communication strategy based on private storage unit, and specifies the data Baotou format. The message passing inter core communication can enhance the efficiency of inter core communication and the computing efficiency of the processor without the need of processor core.
(4) a shared memory cell in a cluster
In the cluster structure hierarchical storage system, this paper designs a storage unit that can be shared by all the processor nodes in the cluster. On the basis of this structure, a shared memory inter kernel communication mechanism and the corresponding mailbox synchronization mechanism are proposed. By dividing the storage unit into private storage unit and shared memory unit, the data is divided into a private storage unit and a shared memory unit. The locality of the processor is improved, and the delay of processor access is optimized.
(5) chip implementation and application examples
A 16 core processor using the hierarchical storage system uses a TSMC65 nano low power CMOS manufacturing process. The chip contains two cluster units, each cluster unit contains eight processor units and a shared memory unit in a cluster. The processor chip area is 9.1mm2, with a single core area of 0.43mm2, in 1.2V The maximum clock frequency of the power supply voltage is 750MHz. based on the multi core processor. We implement the 3780 point fast Fu Liye transform module to evaluate the performance enhancement effect and the actual power consumption level of the hierarchical storage system. The test results show that the typical power of the single processor core is 34mW, significantly lower than the other types of multi core processors.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332
【相似文献】
相关期刊论文 前10条
1 ;英特尔公司推出新一代Pentium Pro处理器[J];中国电子商情;1996年02期
2 ;Altera宣布为Nios Ⅱ处理器系统提供新的C语言至硬件加速工具[J];电子与电脑;2006年05期
3 ;汽车用GPS导航系统解决方案[J];世界电子元器件;2006年09期
4 徐凤英;;Quad FX反戈一击[J];新电脑;2007年02期
5 ;MCU应用新世界:Cortex-M1微控制器和FPGA[J];世界电子元器件;2008年05期
6 岳阳;;领略英特尔“超线程”技术[J];电脑采购周刊;2002年46期
7 付汉杰;;利用NIOS Ⅱ处理器构建节省成本的嵌入式系统[J];今日电子;2007年05期
8 ;要闻速递[J];电脑采购周刊;2001年34期
9 刘磊;;对片上多核系统的系统结构的研究[J];电脑知识与技术;2008年29期
10 张越;;图形工作站 升级双核 Dell Precision 670[J];个人电脑;2006年02期
相关会议论文 前10条
1 单书畅;胡瑜;李晓维;;多核处理器的核级冗余容错技术[A];第六届中国测试学术会议论文集[C];2010年
2 张晓辉;程归鹏;从明;;龙芯处理器上的TLB性能优化技术[A];2010年第16届全国信息存储技术大会(IST2010)论文集[C];2010年
3 祁舒U,
本文编号:1853343
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1853343.html