当前位置:主页 > 科技论文 > 计算机论文 >

嵌入式多核处理器设计与实现关键技术研究

发布时间:2018-05-24 00:31

  本文选题:嵌入式多核处理器 + 片上网络 ; 参考:《合肥工业大学》2012年博士论文


【摘要】:嵌入式应用已经从早期的工业控制领域扩展到以媒体处理,信息处理为代表的计算密集型应用领域,对嵌入式微处理器的性能提出了更高要求。与此同时,随着VLSI技术进步,单纯依靠提高主频进而提升处理器性能的道路已经走到尽头,设计以多核处理器为代表的先进处理器体系结构已经成为提升处理器性能,满足不断提升的应用需求的主要途径。随着工艺技术的进步,嵌入式多核处理器已经得到较快发展,但仍然面临一系列科学技术问题亟待解决。因此,开展嵌入式多核处理器设计与实现关键技术研究,具有重要的理论和现实意义。 合成孔径雷达(Synthetic Aperture Radar, SAR)是一种典型的计算密集型嵌入式应用,并且在军事、经济和环境等领域有重要应用价值。本文以SAR实时成像应用为例,探索面向高性能计算领域的多核架构设计方法,重点从架构设计与实现、应用加速设计以及应用映射等方面开展研究工作。针对高性能嵌入式应用对高计算能力的需求,本文提出了基于“任务簇”的处理器体系结构模型,并根据该模型设计了一种嵌入式多核处理器架构。通过讨论单层结构和层次化结构片上网络的通讯性能与应用的通讯特征间的关系,本文还设计了一种双层混合结构的多核通讯架构,并研究了通讯架构中路由器类型的选择以及路由器的体系结构设计问题。FFT是SAR成像应用中的主要运算任务。为加速FFT运算过程,本文提出了一种高性能的并行FFT处理架构。针对多核芯片组协同工作问题,本文提出了一种面向多核芯片组的任务映射算法,以及一种具有普适性的多核芯片通讯方案。最后,在上述研究成果的基础上,设计了一款SAR实时成像嵌入式多核原型系统,验证了本文的研究工作。 本文所取得的研究成果主要有: 1.提出一种基于“任务簇”的处理器体系结构模型,并根据该模型设计了一种嵌入式多核处理器架构,其中通讯架构采用双层混合结构。针对高性能嵌入式应用对高计算能力的需求,基于“任务簇”的处理器体系结构模型通过细分计算任务、加速规则计算任务来提高处理器的计算能力。通过讨论单层结构和层次化结构片上网络的通讯性能与应用的通讯特征间的关系,本文设计了一种混合层次化双层结构的多核通讯架构。新通讯架构为嵌入式多核处理器提供了充足的片上通讯带宽,并兼顾了应用通讯特征的多样性。 2.仿真分析了电路交换路由器与支持虚拟通道的虫孔交换路由器,在不同通讯特征下的通讯性能:电路交换路由器预先建立端到端的传输链路,链路建立后报文切片顺次连续到达,并且路由器面积较小,在长报文传输(切片数量为几百个)时通讯性能可以接受,但是在短报文传输(切片数量为十几个)时通讯性能较差;虫孔交换路由器不能保证报文切片连续到达且面积稍大,但对于长/短报文传输均表现出优异的通讯性能。上述结论可以用来指导片上网络设计中路由器的选择。 3.提出了一种支持虚拟电路的电路交换路由器。针对已有电路交换路由器链路利用率较低的不足,本文研究了一种支持虚拟电路的电路交换路由器。实验表明,新的路由器设计能够有效的降低报文传输延迟并提高饱和注入率。 4.采用定常结构的FFT运算流图提出了一种无存储访问冲突的基2×K并行FFT架构。该架构通过并行地址产生算法,使K个基2蝶形运算单元同时读取或写入所需的2K个操作数,达到平均每周期完成K个基2蝶式运算的处理能力。与已有的并行FFT架构相比,地址映射算法易于硬件实现。并行地址产生部件由一个计数器和共4K个二选一多路选择器组成,结构简单,并且对于不同K值,并行地址产生部件结构相同,可以方便的根据FFT运算的速度要求设计不同并行度的FFT处理器,具有很好的可扩展性。在资源消耗方面,不考虑旋转因子,对于N点的FFT,通常采用定常结构的FFT处理器需要2N个存储单元,而本文提出的FFT处理器只需要3N/2个存储单元。 5.针对多核芯片组协同工作问题,本文提出了一种面向多核芯片组的任务映射算法,以及一种具有普适性的多核芯片通讯方案。板级互连总线的通讯带宽较小,并且受芯片管脚个数限制,板级的数据链路个数有限,采用面向多核芯片组的任务映射算法可以有效减少芯片间的任务通讯量。同时,针对报文数据在多核芯片组中的传输问题,本文还提出了一种多核芯片通讯方案。该方案具有普适性,不受多核芯片的数量、拓扑结构和路由算法限制,并且易于硬件实现。 6.在上述研究成果的基础上,本文设计了一款SAR实时成像多核原型系统。原型系统主要包括4颗Xilinx Virtex-6-550T FPGA芯片以及一些存储、接口和电源管理芯片。4颗FPGA芯片均采用本文提出的嵌入式多核处理器体系架构设计。原型系统流水处理雷达回波数据,工作频率在80MHz时,能够在18秒内得到一幅4096×2048点的256级灰度SAR图像,并且原型系统的输出图像与PC得到的原始图像间的差别可以忽略,成像质量很好。
[Abstract]:Embedded applications have been extended from the early industrial control field to media processing, information processing as a computing intensive application field and higher requirements for the performance of embedded microprocessors. At the same time, with the progress of VLSI technology, the road to improve the performance of the processor simply depends on the improvement of the main frequency and the performance of the processor. The advanced processor architecture, represented by multi core processors, has become the main way to improve the performance of the processor and meet the increasing application requirements. With the progress of technology, the embedded multi-core processor has developed rapidly, but still faces a series of scientific and technical problems to be solved. Therefore, the embedded system is embedded in the process. Research on the key technologies of design and implementation of multi-core processor has important theoretical and practical significance.
Synthetic Aperture Radar (SAR) is a typical computing intensive embedded application, and has important application value in military, economic and environmental fields. This paper, taking SAR real-time imaging application as an example, explored the multi-core framework design method for high performance computing field, focusing on the design and implementation of the architecture and application. In order to meet the requirements of high computing capability for high performance embedded applications, this paper proposes a "task cluster" based processor architecture model, and designs an embedded multi core processor architecture based on this model. By discussing a single layer structure and a hierarchical structure on chip network. The relationship between the communication performance and the communication characteristics of the application, this paper also designs a multi core communication architecture of the double layer hybrid structure, and studies the selection of the router type and the architecture design of the router in the communication architecture..FFT is the main operation task in the SAR imaging application. In order to speed up the FFT operation, this paper proposes a new method. The high performance parallel FFT processing architecture. Aiming at the problem of multi core chipset cooperative work, this paper presents a task mapping algorithm for multi core chipset and a universal multi-core chip communication scheme. Finally, based on the above research results, a SAR real-time imaging embedded multi-core prototype system is designed. The research work of this article is confirmed.
The main achievements of this paper are as follows:
1. a kind of processor architecture model based on "task cluster" is proposed, and an embedded multi core processor architecture is designed based on the model. The communication architecture uses a double layer hybrid structure. For high performance embedded applications, the processor architecture model based on "task cluster" is a subdivision scheme. By discussing the relationship between the communication performance of the single layer structure and the hierarchical structure and the communication features of the application, this paper designs a multi layer multi-core communication architecture with mixed hierarchical structure. The new communication architecture provides the embedded multi-core processor. The communication bandwidth of the chip is taken into account, and the diversity of application communication features is taken into account.
2. simulation and analysis of the network switching router and the worm hole switching router supporting the virtual channel, the communication performance under the different communication characteristics: the circuit switching router establishes the end to end transmission link in advance. After the link is established, the packet slicing is continuous and continuous, and the router face product is small, and the length of the long message is hundreds of slices. The communication performance is acceptable while the communication performance is acceptable, but the communication performance is poor in the short message transmission. The wormhole switching router can not guarantee the continuous arrival of the message slice and the area is slightly larger, but it shows excellent communication performance for the long / short message transmission. The last conclusion can be used to guide the routing in the network design. The choice of the device.
3. a circuit switching router that supports virtual circuits is proposed. In this paper, a circuit switching router supporting virtual circuits is studied in this paper. The experiment shows that the design of the new router can effectively reduce the delay of message transmission and increase the saturation injection rate.
4. a base 2 * K parallel FFT architecture with no storage access conflict is proposed by using the constant structure of FFT flow graph. The architecture uses parallel address generation algorithm to read or write the 2K operand of K base 2 butterfly operation units at the same time, and achieves the processing ability of K base 2 butterfly operation on an average per cycle. And the existing parallel FFT Compared with the architecture, the address mapping algorithm is easy to implement. The parallel address generation component is composed of a counter and a common 4K two selector. The structure is simple, and for different K values, the parallel address generation component is the same. It is convenient to design FFT processors with different parallelism according to the speed of FFT operation. Good scalability. In terms of resource consumption, the rotation factor is not considered. For the FFT of the N point, the normally structured FFT processor needs 2N storage units, and the FFT processor proposed in this paper requires only 3N/2 storage units.
5. aiming at the problem of multi core chipset cooperative work, this paper proposes a task mapping algorithm for multi core chipset and a universal multi-core chip communication scheme. The communication bandwidth of the board level interconnection bus is small, and the number of the chip foot is limited. The number of the data link number of the board level is limited, and the multi core chip is used for the multi core chip group. The task mapping algorithm can effectively reduce the amount of communication between chips. At the same time, a multi core chip communication scheme is proposed for the transmission of message data in multi core chipset. This scheme is universal, not subject to the number of multi-core chips, topology and path constraints, and easy to implement in hardware.
6. on the basis of the above research results, this paper designs a SAR real-time imaging multi-core prototype system. The prototype system mainly includes 4 Xilinx Virtex-6-550T FPGA chips and some storage, and the interface and power management chip.4 FPGA chips are designed by the embedded multi-core processor architecture proposed in this paper. The radar echo data, when working frequency is 80MHz, can get a 4096 x 2048 point gray SAR image in 18 seconds, and the difference between the output image of the prototype system and the original image obtained by PC can be ignored, and the imaging quality is very good.
【学位授予单位】:合肥工业大学
【学位级别】:博士
【学位授予年份】:2012
【分类号】:TN957.52;TP332

【参考文献】

相关期刊论文 前10条

1 刘建;陈杰;敖天勇;许汉荆;;片上异构多核DSP同步与通信的实现[J];电子科技大学学报;2010年04期

2 黄宁;朱恩;荣瑜;;高速FFT芯片设计及结构研究[J];电子器件;2008年02期

3 杨盛光;李丽;高明伦;张宇昂;;面向能耗和延时的NoC映射方法[J];电子学报;2008年05期

4 杨际祥;谭国真;王荣生;;多核软件的几个关键问题及其研究进展[J];电子学报;2010年09期

5 卢世祥,韩松,王岩飞;合成孔径雷达实时成像转置存储器的两页式结构与实现[J];电子与信息学报;2005年08期

6 齐子初;刘慧;石小兵;韩银和;;龙芯3号多核处理器的低功耗测试技术[J];计算机辅助设计与图形学学报;2010年11期

7 尹亚明,李琼,郭御风,刘光明;新型高性能RapidIO互连技术研究[J];计算机工程与科学;2004年12期

8 阎鸣生,茅于海;定常结构FFT算法[J];计算机学报;1989年07期

9 谢应科,侯紫峰,韩承德;基2×2FFT的地址映射算法[J];计算机学报;2000年10期

10 马余泰;FFT处理器无冲突地址生成方法[J];计算机学报;1995年11期

相关博士学位论文 前2条

1 郭建军;同步数据触发体系结构多核处理器存储系统关键技术研究[D];国防科学技术大学;2008年

2 赖明澈;同步数据触发多核处理器体系结构关键技术研究[D];国防科学技术大学;2008年



本文编号:1926990

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1926990.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户44503***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com