基于2.5D封装系统的存储型计算研究
发布时间:2018-11-06 09:05
【摘要】:对于数据密集型应用,大量能量和延时消耗在计算和存储单元之间的数据传输上,造成冯·诺依曼瓶颈。在采用2.5D封装集成的系统中,这一问题依然存在。为此,提出一种新型的硬件加速方案。引入存储型计算到2.5D系统中,使片外存储具备运算的能力。将存储器划分为若干个bank,支持bank间并行访问,并在存储阵列中设计可配置的加速单元,充分利用存储阵列的带宽进行并行计算,降低数据传输的延时和能耗。以H.264解码中的反量化反变换为例对该结构进行实现,仿真结果显示,相较于传统软件实现方法,该方案可获得7.1倍的性能提升,节省80.5%的能量,并且只增加2%的面积开销。
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者单位】: 复旦大学专用集成电路与系统国家重点实验室;中山大学中山大学-卡内基梅隆大学联合工程学院;广东顺德中山大学-卡内基梅隆大学国际联合研究院;
【基金】:广东顺德中山大学-卡内基梅隆大学国际联合研究院项目(20150303) 三星电子横向课题(SLSI-201403DD013)
【分类号】:TN405
,
本文编号:2313838
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者单位】: 复旦大学专用集成电路与系统国家重点实验室;中山大学中山大学-卡内基梅隆大学联合工程学院;广东顺德中山大学-卡内基梅隆大学国际联合研究院;
【基金】:广东顺德中山大学-卡内基梅隆大学国际联合研究院项目(20150303) 三星电子横向课题(SLSI-201403DD013)
【分类号】:TN405
,
本文编号:2313838
本文链接:https://www.wllwen.com/kejilunwen/dianzigongchenglunwen/2313838.html