循环注意力模型的训练优化策略研究

发布时间：2018-11-09 13:15

【摘要】：近年来,深度学习在计算机视觉,机器翻译,语音识别等领域取得了极大的成功,在多个应用领域上取得了当前的最好成绩。但是这些模型所取得的高精确度主要来源于在训练以及检测时投入了高额计算成本。传统的深度学习的一个主要运算瓶颈在于需要对整幅图像处理,而人类视觉只需要将视觉焦点集中在当前感兴趣的区域上,这一特点能够很有效的减少人类视觉系统的“带宽”。在视觉领域中,尽管研究人员提出了如减少滑动窗口的方法以提高计算效率,但是深度模型的计算成本依然跟输入图像的大小成正比。为了解决该问题,本文模拟人类视觉系统特点引入注意力机制。目前的注意力机制主要分为Soft Attention以及Hard Attention。Soft Attention基于显著图的可微模型,而Hard Attention采用离散的注意力位置产生凝视(Glimpse)区域来产生注意力特征。本文从Hard Attention出发,基于循环注意力模型(RAM)提出了两种优化策略——OV-RAM以及EM算法,并且在弱标记的数据集Translated MNIST以及Cluttered MNIST上进行了测试。循环注意力模型基于RNN,能够每次凝视不同的感知区域从而更新自身隐含状态,通过这些累积信息来做决策。由于每次只处理感兴趣的小部分区域,所以运算效率比传统深度学习网络更高。由于循环注意力采用了离散的、不可导的注意力位置,并且运用了强化学习来求解注意力位置选择策略,这使得循环注意力模型训练速率较慢。本文借鉴前人的模型,将Soft Attention与Hard Attention结合,给循环注意力模型加入Overview层以提供上下文信息,得到OV-RAM模型。同时,本文分析了循环注意力模型结构中的问题,从有监督学习的角度重新推导了目标函数,将两个耦合部分拆分,从而引入EM算法对进行训练。最后,我们对一些失败样例进行了分析,并给出了一些解决措施。本文使用Translated MNIST以及Cluttered MNIST数据集进行训练及测试。实验结果证实,本文提出的OV-RAM以及EM算法能够有效的提升循环注意力模型的训练速率。该方法只需要更短的迭代次数就可达到同样的收敛精度,证明了本文提出的两种优化策略的有效性。
[Abstract]:In recent years, deep learning has achieved great success in computer vision, machine translation, speech recognition and so on. However, the high accuracy of these models is mainly due to the high cost of training and detection. One of the main operational bottlenecks of traditional in-depth learning is the need to process the entire image, while human vision only needs to focus on the current region of interest. This feature can effectively reduce the "bandwidth" of human visual systems. In the field of vision, although researchers have proposed methods such as reducing sliding windows to improve computational efficiency, the computational cost of the depth model is still proportional to the size of the input image. In order to solve this problem, the attention mechanism is introduced to simulate the characteristics of human visual system. At present, the attention mechanism is mainly divided into Soft Attention and Hard Attention.Soft Attention differentiable model based on salient graph, while Hard Attention uses discrete attention position to produce gaze (Glimpse) region to produce attention feature. In this paper, based on Hard Attention, two optimization strategies, OV-RAM and EM algorithm, are proposed based on the cyclic attention model (RAM) and tested on the weakly marked dataset Translated MNIST and Cluttered MNIST. The cyclic attention model is based on the fact that RNN, can gaze into different perceptual regions each time to update its own implicit state and make decisions through these cumulative information. Since only a small part of the region of interest is processed at a time, the computational efficiency is higher than that of the traditional deep learning network. Because cyclic attention adopts discrete and underivable attention positions and reinforcement learning is used to solve the attention position selection strategy, the training rate of cyclic attention model is slow. Based on the previous models, this paper combines Soft Attention with Hard Attention, adds Overview layer to the cyclic attention model to provide context information, and obtains the OV-RAM model. At the same time, this paper analyzes the problems in the structure of the cyclic attention model, rededuces the objective function from the perspective of supervised learning, splits the two coupling parts, and then introduces the EM algorithm to train. Finally, we analyze some examples of failure and give some solutions. This paper uses Translated MNIST and Cluttered MNIST data sets for training and testing. Experimental results show that the proposed OV-RAM and EM algorithms can effectively improve the training rate of the cyclic attention model. The method can achieve the same convergence accuracy only with shorter iterations, which proves the effectiveness of the two optimization strategies proposed in this paper.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】