面向特定应用的多核处理器体系结构关键技术研究
发布时间:2018-09-10 07:45
【摘要】:随着多媒体业务和无线通信技术的快速发展,人们对多媒体应用以及高速、可靠且无缝衔接的无线通信应用的需求不断膨胀,因此对面向多媒体广播和无线通信技术的特定应用多核处理器需求变得尤为重要。多核处理器的兴起与发展,很大程度上是由于集成电路制造工艺技术的不断进步以及体系结构设计的日益成熟。多媒体广播和无线通信技术的信息处理,其核心是处理器。本课题针对面向未来多媒体广播和无线通信算法的实际需求,研究新型的处理器体系结构技术,将处理器体系结构的特点与特定应用领域需求相结合,为未来的发展探索新的解决思路。本文针对多媒体广播、无线通信以及信道编码技术的应用需求,提出并设计了一款面向特定应用的多核处理器体系结构。其主要工作如下:1、该多核处理器结构在单芯片上基于2D-Mesh拓扑结构集成了15个处理器单元和一个共享存储器节点单元。并且实现了基于共享存储的核间通信方式。2、本文以MIPS指令为参考,设计并实现了一款兼容MIPS指令集的单核处理器,该处理器在Kintex?-7系列的XC7K70T-2fbg676硬件平台上基于ISE14.6得出其最大工作频率为91.533MHz,最小周期为10.925ns。为了实现其低开销低延迟性,先对处理器各单元进行了综合实现,最后对运算部件进行了实现与优化。3、基于提前路由和推测技术提出一款仅需两周期流水结构的低开销低延迟虫孔虚通道路由器。在该路由器中加入了输入端口请求屏蔽模块并对两种情况进行了屏蔽。为了防止数据分组的丢失以及提高缓存(buffer)资源的利用率,提出了一种基于信用的流控(CBFC)机制。基于ISE14.6综合后得出其最大工作频率为297.983MHz,最小周期为3.356ns。为了验证路由器的低开销低延迟性,设计并实现了其他几种常用的路由器结构,通过对比得出该款路由器在延迟和资源开销方面的优势。最后通过Splash-2应用程序模拟真实应用得到其平均延迟。4、为了兼容处理单元和低开销低延迟路由器,实现了一款基于Wishbone总线的低开销低延迟网络接口,为了提升其低开销低延迟性能,对可重构单元异步FIFO进行了设计与优化。并在180nm工艺下使用DC综合工具得出可重构FIFO结构的面积和功耗开销。通过ISE14.6综合出两种网络接口的开销、频率和延迟性能。5、为了论文的完备性,参照课题组其他成员的工作,基于特定应用多核处理器对LDPC译码器、H.264解码器和FFT译码器等方面进行了映射方法的研究。
[Abstract]:With the rapid development of multimedia services and wireless communication technology, the demand for multimedia applications and high-speed, reliable and seamless wireless communication applications is expanding. Therefore, the demand for multi-core processors for multimedia broadcasting and wireless communication technologies becomes particularly important. To a large extent, it is due to the progress of IC manufacturing technology and the maturity of architecture design. The core of information processing in multimedia broadcasting and wireless communication technology is processor. In this paper, a multi-core processor architecture for specific applications is proposed and designed according to the application requirements of multimedia broadcasting, wireless communication and channel coding technology. This multi-core processor architecture integrates 15 processor units and a shared memory node unit on a single chip based on the 2D-Mesh topology. It also implements the inter-core communication mode based on shared memory. 2. This paper designs and implements a single-core processor compatible with MIPS instruction set, which is based on the MIPS instruction set. Based on ISE14.6, the XC7K70T-2fbg676 hardware platform of series?-7 has a maximum operating frequency of 91.533 MHz and a minimum cycle of 10.925 ns. In order to realize the low overhead and low latency, the processor units are synthetically implemented, and the computing units are optimized. 3. Based on the advanced routing and inference technology, a new one is proposed, which is only based on the advanced routing and inference technology. A low-overhead and low-latency wormhole virtual channel router with two-cycle pipeline architecture is proposed. Input port request shielding module is added to the router and two cases are shielded. In order to prevent the loss of data packets and improve the utilization of buffer resources, a credit-based flow control (CBFC) mechanism is proposed. In order to verify the low overhead and low latency of the router, several other commonly used router structures are designed and implemented, and the advantages of the router in delay and resource overhead are obtained by comparing. Finally, the real application is simulated by Splash-2 application program. The average delay is obtained. 4. In order to be compatible with processing units and low-overhead low-delay routers, a low-overhead and low-latency network interface based on Wishbone bus is implemented. In order to improve its low-overhead and low-latency performance, the reconfigurable unit asynchronous FIFO is designed and optimized, and the reconfigurable FIF is obtained by DC synthesis tool in 180 nm process. The overhead, frequency and delay performance of the two network interfaces are synthesized by ISE14.6. For the completeness of the paper and the work of other members of the research group, the mapping methods of LDPC decoder, H.264 decoder and FFT decoder are studied based on the special application multicore processor.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP332
,
本文编号:2233820
[Abstract]:With the rapid development of multimedia services and wireless communication technology, the demand for multimedia applications and high-speed, reliable and seamless wireless communication applications is expanding. Therefore, the demand for multi-core processors for multimedia broadcasting and wireless communication technologies becomes particularly important. To a large extent, it is due to the progress of IC manufacturing technology and the maturity of architecture design. The core of information processing in multimedia broadcasting and wireless communication technology is processor. In this paper, a multi-core processor architecture for specific applications is proposed and designed according to the application requirements of multimedia broadcasting, wireless communication and channel coding technology. This multi-core processor architecture integrates 15 processor units and a shared memory node unit on a single chip based on the 2D-Mesh topology. It also implements the inter-core communication mode based on shared memory. 2. This paper designs and implements a single-core processor compatible with MIPS instruction set, which is based on the MIPS instruction set. Based on ISE14.6, the XC7K70T-2fbg676 hardware platform of series?-7 has a maximum operating frequency of 91.533 MHz and a minimum cycle of 10.925 ns. In order to realize the low overhead and low latency, the processor units are synthetically implemented, and the computing units are optimized. 3. Based on the advanced routing and inference technology, a new one is proposed, which is only based on the advanced routing and inference technology. A low-overhead and low-latency wormhole virtual channel router with two-cycle pipeline architecture is proposed. Input port request shielding module is added to the router and two cases are shielded. In order to prevent the loss of data packets and improve the utilization of buffer resources, a credit-based flow control (CBFC) mechanism is proposed. In order to verify the low overhead and low latency of the router, several other commonly used router structures are designed and implemented, and the advantages of the router in delay and resource overhead are obtained by comparing. Finally, the real application is simulated by Splash-2 application program. The average delay is obtained. 4. In order to be compatible with processing units and low-overhead low-delay routers, a low-overhead and low-latency network interface based on Wishbone bus is implemented. In order to improve its low-overhead and low-latency performance, the reconfigurable unit asynchronous FIFO is designed and optimized, and the reconfigurable FIF is obtained by DC synthesis tool in 180 nm process. The overhead, frequency and delay performance of the two network interfaces are synthesized by ISE14.6. For the completeness of the paper and the work of other members of the research group, the mapping methods of LDPC decoder, H.264 decoder and FFT decoder are studied based on the special application multicore processor.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP332
,
本文编号:2233820
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2233820.html