当前位置:主页 > 科技论文 > 计算机论文 >

片上多核同步单元的研究实现及片间扩展

发布时间:2019-01-27 09:31
【摘要】:随着应用需求以及芯片制造工艺的发展,单个芯片上能够集成更多的处理器资源和存储资源,片上系统逐渐由单核结构发展为多核结构。多核体系结构的出现在带来性能提升的同时,对核间同步机制也提出了更高的要求。充分发挥多核芯片中各处理器核的处理能力,需要高效的同步机制支持。X-DSP是由我校自主开发的高性能多核DSP,采用自主设计的体系结构与指令集结构,主要应用于信号与图像处理等存在大批量的数据处理需求的领域。芯片内部集成了多个DSP核与全局cache,通过PCIE接口实现与片外的高速互联通信。其多核结构支持多个任务并行执行,各任务间的数据通信需要高效的同步机制保证执行的正确性及高效性。本文基于X-DSP的系统结构特点,采用分布式的硬件同步单元实现了多核间的同步。同时为了让片外处理器核有效参与同步,完成了基于PCIE的接口扩展工作,设计并实现了PCIE-NI转接桥。本文的工作内容与贡献主要体现在以下几个方面:(1)分析比较了硬件同步方案与软件同步方案,确定了基于锁和栅栏的硬件同步机制,通过减少同步操作对正常访存行为的影响提高了同步效率。(2)综合考虑X-DSP的体系结构特点,设计了包含硬件锁与栅栏的分布式的硬件同步单元总体结构。其中,硬件锁具有旋转锁与排队旋转锁两种工作模式,有效减少锁获取请求数目;硬件栅栏采用广播方式进行释放,从而减少传统栅栏串行释放造成的网络热点问题。(3)设计了PCIE-NI转接桥,实现了AXI标准接口、PBUS以及DBI接口和X-DSP自主设计的NI接口之间的协议转接,使得片外处理器核能够有效参与同步并实现片内外数据共享。(4)基于层次化的验证方法学,完成了模块级验证,并在全芯片系统环境下完成了系统级验证,以及硬件同步单元与PCIE-NI转接桥之间的联合测试。逻辑综合的结果表明,本文的设计能够满足性能需求。
[Abstract]:With the development of application requirements and chip manufacturing technology, more processor and memory resources can be integrated on a single chip, and the on-chip system is gradually developed from a single-core structure to a multi-core structure. The emergence of multi-core architecture not only improves performance, but also puts forward higher requirements for inter-core synchronization mechanism. To give full play to the processing power of each processor core in the multi-core chip, we need the support of efficient synchronization mechanism. X-DSP is a self-designed architecture and instruction set structure for high-performance multi-core DSP, developed by our university. It is mainly used in the field of signal and image processing. Multiple DSP cores and global cache, are integrated into the chip to communicate with high speed out of chip via PCIE interface. The multi-core architecture supports the parallel execution of multiple tasks, and the data communication among the tasks requires an efficient synchronization mechanism to ensure the correctness and efficiency of the execution. Based on the system structure of X-DSP, this paper uses distributed hardware synchronization unit to realize multi-core synchronization. At the same time, in order to make the off-chip processor core participate in the synchronization effectively, the interface extension based on PCIE is completed, and the PCIE-NI bridge is designed and implemented. The main contents and contributions of this paper are as follows: (1) the hardware synchronization scheme and the software synchronization scheme are analyzed and compared, and the hardware synchronization mechanism based on lock and fence is determined. By reducing the influence of synchronous operation on the normal memory access behavior, the synchronization efficiency is improved. (2) considering the architecture characteristics of X-DSP, a distributed hardware synchronization unit including hardware lock and fence is designed. Among them, the hardware lock has two working modes: the rotation lock and the queue rotation lock, which can effectively reduce the number of requests for lock acquisition. The hardware fence is released by broadcast, thus reducing the network hot issues caused by the serial release of the traditional fence. (3) the PCIE-NI transfer bridge is designed, and the AXI standard interface is realized. The protocol transfer between PBUS and DBI interface and NI interface designed by X-DSP makes the core of off-chip processor participate in synchronization effectively and realize data sharing between chip and chip. (4) Module level verification is completed based on hierarchical verification methodology. The system level verification and the joint test between the hardware synchronization unit and the PCIE-NI bridge are completed in the full chip system environment. The results of logic synthesis show that the design of this paper can meet the performance requirements.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP332

【参考文献】

相关期刊论文 前6条

1 陈书明;万江华;鲁建壮;刘仲;孙海燕;孙永节;刘衡竹;刘祥远;李振涛;徐毅;陈小文;;YHFT-QDSP:High-Performance Heterogeneous Multi-Core DSP[J];Journal of Computer Science & Technology;2010年02期

2 颜建峰;吴宁;;基于PCI总线的DMA高速数据传输系统[J];电子科技大学学报;2007年05期

3 Mick Posner;;快速实现基于AMBA 3 AXI协议的设计[J];电子设计应用;2007年01期

4 蒋周良;权进国;林孝康;;AMBA总线新一代标准AXI分析和应用[J];微计算机信息;2006年29期

5 汪东,马剑武,陈书明;基于Gray码的异步FIFO接口技术及其应用[J];计算机工程与科学;2005年01期

6 胡伟武,,夏培肃;顺序一致共享存储系统中的乱序执行技术──基本理论[J];计算机学报;1997年06期

相关博士学位论文 前1条

1 贾小敏;多核处理器片上Cache访问行为分析与优化机制研究[D];国防科学技术大学;2011年

相关硕士学位论文 前4条

1 梁天永;IP集成方案研究与DFI-AXI总线桥的设计[D];华南理工大学;2010年

2 黄颖然;基于覆盖率验证方法的IP核测试平台设计[D];西安电子科技大学;2009年

3 黄冕;X处理器存储一致性模型的研究与实现[D];国防科学技术大学;2008年

4 陈石坤;多核处理器中CACHE一致性协议研究和实现[D];国防科学技术大学;2005年



本文编号:2416136

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2416136.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e4bcf***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com