基于机器学习的游戏智能系统研究与应用

发布时间：2018-10-24 11:07

【摘要】：在机器学习领域中,直接利用高维的感知数据,例如视觉语音信号等,训练学习并获得一个具有良好控制策略的决策系统仍然是一个挑战性的问题。在Deep Q-Learning Network(DQN)提出以前,现有领域内成功的强化学习应用案例主要依赖于组合人工特征或者策略表达来实现,特征的适用性严重影响最后的结果。随着深度强化学习领域的发展,利用DQN算法直接从高维数据以及环境的反馈中,能够以平稳的方式成功地学习到一个非常好的控制策略,在Atari环境中大部分游戏取得了非常好的表现。利用卷积神经网络拥有的直接从高维数据中提取特征的能力以及Q-Learning算法用于训练动作评价网络,DQN在游戏智能领域提供了新的解决思路。然而仍然存在一系列的挑战。首先DQN需要一个完全观测的状态信息,在面对需要超过4帧信息用以表示当前状态的时候,并不能获得一个非常好的控制策略,例如在3D环境下。稀疏、有延迟的、有噪声的奖励信号是另一个问题,强化学习需要从这样一个奖励信号中去学习控制策略,但是由于样本间的强相关性以及奖励信号所存在的问题,往往不能获得比较好的效果。本文利用LSTM(Long-Short Term Memory)网络对长时间状态的记忆能力,以及使用改进的异步训练算法,根据实验的具体情况设计一个基于深度神经网络的游戏智能系统,并在一个3D环境下,验证智能系统所学习到的决策能力。
[Abstract]:In the field of machine learning, it is still a challenging problem to directly use high-dimensional perceptual data, such as visual speech signals, to train and learn and obtain a decision system with good control strategy. Before Deep Q-Learning Network (DQN) was proposed, successful reinforcement learning cases in existing fields mainly depended on the combination of artificial features or strategy expression to achieve, and the applicability of the features seriously affected the final results. With the development of deep reinforcement learning, we can learn a very good control strategy in a steady way by using DQN algorithm directly from high-dimensional data and environment feedback. Most of the games performed very well in the Atari environment. Using the ability of the convolutional neural network to extract features directly from the high-dimensional data and the Q-Learning algorithm used to train the action evaluation network, DQN provides a new solution in the field of game intelligence. However, a number of challenges remain. First of all, DQN needs a fully observed state information. When more than four frames of information are needed to represent the current state, a very good control strategy can not be obtained, such as in 3D environment. Sparse, delayed, noisy reward signals are another problem. Reinforcement learning requires learning control strategies from such a reward signal, but due to strong correlation between samples and problems with reward signals, It is often impossible to achieve better results. In this paper, a game intelligent system based on deep neural network is designed based on LSTM (Long-Short Term Memory) network) memory ability of long time state and improved asynchronous training algorithm, according to the specific conditions of the experiment, and in a 3D environment. Verify the decision ability learned by the intelligent system.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.52;TP18

【参考文献】