卷积神经网络处理器的设计与实现
发布时间:2018-06-16 03:54
本文选题:卷积神经网络 + 自定义指令 ; 参考:《西安理工大学》2017年硕士论文
【摘要】:卷积神经网络(Convolutional NeuralNetwork, CNN)是一种先进的深度学习架构,被广泛地应用于图像识别、语音识别、自然语言识别等各个领域。卷积神经网络具有数据密集和计算密集的特点,传统的CPU平台无法充分挖掘CNN的并行性,运算耗时长,且实现代价较高。而专用CNN芯片具有速度和成本上的优势,但可配置性差,不能灵活地适应CNN不同层特征图的数量变化。通过分析CNN算法特点及问题,在传统通用ZION处理器的基础上,通过设计专用指令并改进架构,设计了一种可以兼顾CNN并行运算能力及灵活性的新型的卷积神经网络处理器。主要研究内容如下:1.设计专用指令。首先,对CNN算法进行操作类型统计和分析,发现卷积运算,下采样,激活函数等操作类型出现频率较高。针对此特点,设计了相应的运算功能指令,用一条功能指令完成原本需要多条指令实现的运算过程。其次,设计向量访存指令,实现一次读写多条数据,以减少访存指令数量,提高访存效率。最后,基于RISC-V32指令集及其扩展指令的规则,完成CNN专用指令系统的设计。2.处理器架构设计。在本研究组设计的通用七级流水结构ZION处理器的基础上,设计了支持CNN专用指令的流水功能部件。针对卷积运算中同一卷积模板在输入特征图不同位置做卷积时的数据复用特点,设计复用结构,从而减少特征图数据读取次数,降低访存需求。此外,为减小访存延迟对并行运算的影响,采用双Buffer模式分时缓存不同特征图的数据,减少运算单元空置时间,提高并行效率。在指令和架构设计的基础上,采用Verilog HDL实现了专用指令的流水功能部件设计,完成了一个七级流水结构的卷积神经网络处理器的整体系统设计,并通过功能仿真。。该CNN处理器不仅能实现通用算法,还对CNN算法有显著加速效果。针对CNN算法,采用MNIST手写数字字符库作为样本集,对设计的卷积神经网络处理器进行了测试。与通用ZION处理器相比,处理速度提升6.955倍,速度面积比提升3.398倍。
[Abstract]:Convolutional Neural Network (CNN) is an advanced deep learning architecture, which is widely used in image recognition, speech recognition, natural language recognition and other fields. Convolutional neural networks are data-intensive and computation-intensive. The traditional CPU platform can not fully exploit the parallelism of CNN. The computation time is long and the cost of implementation is high. The special CNN chip has the advantage of speed and cost, but it is not configurable, so it can not adapt to the change of the number of CNN layers. Based on the analysis of the characteristics and problems of CNN algorithm, a novel convolutional neural network processor is designed, which can take into account the parallel computing capability and flexibility of CNN by designing special instructions and improving the architecture on the basis of the traditional general-purpose Zion processor. The main research contents are as follows: 1. Design special instructions. Firstly, the operation type statistics and analysis of CNN algorithm show that the operation types such as convolution operation, downsampling, activation function and so on appear more frequently. In view of this characteristic, the corresponding operation function instruction is designed, and one function instruction is used to complete the operation process which is originally needed to be realized by multiple instructions. Secondly, the vector access instruction is designed to read and write multiple data at a time, so as to reduce the number of access instructions and improve the efficiency of memory access. Finally, based on the rules of RISC-V32 instruction set and its extended instruction, the design of CNN special instruction system is completed. Processor architecture design. On the basis of the general seven stage pipelined architecture Zion processor designed by our team, a pipelining function unit supporting CNN special instructions is designed. Aiming at the data reuse characteristics of the same convolution template in different places of input feature map in convolution operation, the multiplexing structure is designed, so as to reduce the number of times of reading feature graph data and reduce the demand of memory access. In addition, in order to reduce the influence of memory access delay on parallel operation, the data of different feature graphs are cached by using double buffer mode, which reduces the vacancy time of operation units and improves the parallel efficiency. On the basis of instruction and architecture design, the pipelining function part of special instruction is designed with Verilog HDL. The whole system design of a seven-stage convolutional neural network processor with pipelined structure is completed. The CNN processor can not only implement the general algorithm, but also accelerate the CNN algorithm significantly. For CNN algorithm, MNIST handwritten numeric character library is used as sample set to test the designed convolution neural network processor. Compared with the general purpose Zion processor, the processing speed is increased by 6.955 times, and the speed area ratio is increased by 3.398 times.
【学位授予单位】:西安理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP183;TP332
【参考文献】
相关期刊论文 前3条
1 余子健;马德;严晓浪;沈君成;;基于FPGA的卷积神经网络加速器[J];计算机工程;2017年01期
2 方睿;刘加贺;薛志辉;杨广文;;卷积神经网络的FPGA并行加速方案设计[J];计算机工程与应用;2015年08期
3 武晓岛;于鹏;谢学军;;透过专利看微处理器的技术发展(三)——预译码技术专利引证分析[J];中国集成电路;2009年03期
相关博士学位论文 前1条
1 陆志坚;基于FPGA的卷积神经网络并行结构研究[D];哈尔滨工程大学;2013年
,本文编号:2025178
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2025178.html