深度学习的硬件实现与优化技术研究
本文选题:深度学习 + 神经网络 ; 参考:《哈尔滨工业大学》2017年硕士论文
【摘要】:近年来,随着人工智能的兴起,以深度学习为代表的新型智能算法在机器视觉、图像处理、模式识别等多个工程应用领域得到成功应用。但是,在工业大数据的冲击下,传统的软件实现方式无法满足实际工程低成本、高时效、高容错率的需求,因此急需寻求新的解决方案。现场可编程门阵列FPGA作为一种常用硬件开发平台,拥有大规模的分布式硬件资源,并且具有开发周期短、功耗低、性能好等特点,非常适合计算密集型的深度学习算法的实现。本文以FPGA为硬件开发平台,展开深度学习的硬件化实现与优化技术研究,主要研究内容如下:首先,深度学习硬件实现总体方案设计。详细分析深度学习的理论基础知识,并以卷积神经网络为例,进行网络的拓扑结构和功能特点研究,给出本文硬件实现的具体网络拓扑。根据网络拓扑的结构特点,进行系统的总体方案设计,将网络拓扑映射到具体的硬件电路。其次,完成算法硬件移植的优化技术与架构设计。选择FPGA作为本文实现的硬件移植平台。结合本文实现低功耗、高效率深度学习算法的目标,分别对硬件移植的优化技术进行深入研究,并应用优化技术完成对卷积神经网络从粗粒度到细粒度的并行架构设计。然后,完成基于FPGA的卷积神经网络设计与实现。以FPGA为硬件开发平台,完成卷积神经网络的整体架构设计。根据卷积神经网络的结构特点,完成设计各功能电路模块,包括卷积运算模块、抽样运算模块、激活函数模块。本文设计乒乓缓存结构,优化数据传输结构和数据缓存单元。用仿真软件Modelsim分别验证各模块功能正确性。最后,搭建系统整体实验平台。依据现有的实验条件,配置网络结构与参数,设计“FPGA+CPU”的异构体系,完成卷积神经网络的硬件固化。以手写数字识别为具体应用,完成软件和硬件的对比实验。通过大量的实验统计,结果表明本文设计的基于FPGA的卷积神经网络功能完整,性能优异。
[Abstract]:In recent years, with the rise of artificial intelligence, a new intelligent algorithm, represented by deep learning, has been successfully applied in many engineering applications such as machine vision, image processing, pattern recognition and so on. However, under the impact of industrial big data, the traditional software implementation method can not meet the needs of low cost, high aging and high fault tolerance in practical projects, so it is urgent to find new solutions. Field Programmable Gate Array (FPGA), as a common hardware development platform, has large scale distributed hardware resources, short development cycle, low power consumption and good performance, so it is very suitable for the implementation of computationally intensive depth learning algorithm. In this paper, the hardware implementation and optimization technology of deep learning is studied on the platform of FPGA. The main contents are as follows: firstly, the overall scheme of hardware implementation of deep learning is designed. The basic theoretical knowledge of deep learning is analyzed in detail. Taking convolutional neural network as an example, the topological structure and functional characteristics of the network are studied, and the specific network topology realized by hardware in this paper is given. According to the structural characteristics of network topology, the overall scheme of the system is designed, and the network topology is mapped to the specific hardware circuit. Secondly, the optimization technology and architecture design of algorithm hardware transplantation are completed. FPGA is chosen as the hardware porting platform of this paper. Combined with the goal of realizing low power and high efficiency deep learning algorithm in this paper, the optimization technology of hardware transplantation is studied in depth, and the parallel architecture design of convolution neural network from coarse-grained to fine-grained is completed by using optimization technology. Then, the design and implementation of convolution neural network based on FPGA are completed. Using FPGA as hardware development platform, the overall architecture design of convolutional neural network is completed. According to the structural characteristics of the convolution neural network, the functional circuit modules are designed, including convolution operation module, sampling operation module and activation function module. This paper designs ping-pong cache structure, optimizes data transmission structure and data cache unit. The functional correctness of each module is verified by simulation software Modelsim. Finally, the whole experiment platform is built. According to the existing experimental conditions, configuration of network structure and parameters, design of "FPGA CPU" heterogeneous system, complete the hardware solidification of the convolutional neural network. Taking handwritten digit recognition as the concrete application, the contrast experiment between software and hardware is completed. Through a large number of experimental statistics, the results show that the convolution neural network based on FPGA has complete function and excellent performance.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前10条
1 石菲;;当AlphaGo成为大师[J];中国信息化;2017年01期
2 赵冬斌;邵坤;朱圆恒;李栋;陈亚冉;王海涛;刘德荣;周彤;王成红;;深度强化学习综述:兼论计算机围棋的发展[J];控制理论与应用;2016年06期
3 Fei-Yue Wang;Jun Jason Zhang;Xinhu Zheng;Xiao Wang;Yong Yuan;Xiaoxiao Dai;Jie Zhang;Liuqing Yang;;Where Does AlphaGo Go: From Church-Turing Thesis to AlphaGo Thesis and Beyond[J];IEEE/CAA Journal of Automatica Sinica;2016年02期
4 刘建伟;刘媛;罗雄麟;;深度学习研究进展[J];计算机应用研究;2014年07期
5 王梦雪;;数据挖掘综述[J];软件导刊;2013年10期
6 余凯;贾磊;陈雨强;徐伟;;深度学习的昨天、今天和明天[J];计算机研究与发展;2013年09期
7 李海峰;李纯果;;深度学习结构和算法比较分析[J];河北大学学报(自然科学版);2012年05期
8 孙志军;薛磊;许阳明;王正;;深度学习研究综述[J];计算机应用研究;2012年08期
9 毛健;赵红东;姚婧婧;;人工神经网络的发展及应用[J];电子设计工程;2011年24期
10 张萧;黄f^;仲伟汉;张亮;;Sigmoid函数及其导函数的FPGA实现[J];福建师范大学学报(自然科学版);2011年02期
相关博士学位论文 前3条
1 陆志坚;基于FPGA的卷积神经网络并行结构研究[D];哈尔滨工程大学;2013年
2 白洪涛;基于GPU的高性能并行算法研究[D];吉林大学;2010年
3 覃光华;人工神经网络技术及其应用[D];四川大学;2003年
相关硕士学位论文 前8条
1 余子健;基于FPGA的卷积神经网络加速器[D];浙江大学;2016年
2 余奇;基于FPGA的深度学习加速器设计与实现[D];中国科学技术大学;2016年
3 王羽;基于FPGA的卷积神经网络应用研究[D];华南理工大学;2016年
4 朱少杰;基于深度学习的文本情感分类研究[D];哈尔滨工业大学;2014年
5 马冬梅;基于深度学习的图像检索研究[D];内蒙古大学;2014年
6 陈达;基于深度学习的推荐系统研究[D];北京邮电大学;2014年
7 陈先昌;基于卷积神经网络的深度学习算法与应用研究[D];浙江工商大学;2014年
8 凡保磊;卷积神经网络的并行化研究[D];郑州大学;2013年
,本文编号:1984816
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1984816.html