基于QAGKRL的强化学习在线神经解码方法
发布时间:2018-03-11 18:44
本文选题:脑机接口 切入点:神经解码 出处:《浙江大学》2017年硕士论文 论文类型:学位论文
【摘要】:脑机接口将大脑的神经活动通过解码器的解析,转化成可以用来控制外部设备的控制信号,实现了大脑与外界环境的直接交互,为有运动功能缺失的患者带来了一种新的康复途径。现阶段关于神经解码的研究主要是基于监督学习和基于强化学习两大类。相比于需要训练数据的监督学习解码模型,基于强化学习的解码模型的优势在于:(1)无需实际的用户肢体行为数据;(2)允许用户通过试错的方式来动态的操控外部设备;(3)解码模型能自适应神经元发放模式的变化。大脑具有很强的可塑性,环境的变化势必会引起神经元发放模式的改变,强化学习模型这种自适应的特性在脑机接口的解码稳定性中具有重要的作用。本文使用了两只猴子(B04和B10),基于经典center-out伸缩实验范式对强化学习的自适应特性进行了探索,并和经典监督学习方法SVM进行了比较分析。在center-out范式中,猴子通过摇杆控制光标球击打目标球以获得奖赏。同时,用于离线分析的B04神经数据来自于其大脑的双侧初级运动皮层(M1)区,用于在线实验的B10神经数据来自于其双侧背侧前运动皮层(PMd)区。算法部分,我们首先实现了基于误差反向传播(BP)的人工神经网络的强化学习方法(attention gated reinforcement learning,AGREL),和基于径向基函数(RBF)神经网络的强化学习方法(quantized attention gated reinforcement learning,QAGKRL),相比于AGREL有陷入局部最小值的缺陷,该方法能实现非线性神经解码的全局最优解,同时还利用量化方法压缩神经网络的拓扑结构以降低计算复杂度。在离线分析中,我们选用了 10天的数据进行比较分析,综合来看单纯分类时SVM优于QAGKRL,QAGKRL优于AGREL,但QAGKRL和AGREL不经过训练和不需要运动数据就获得了和监督学习方法近似的分类效果,且在样本一的模型上测试样本二时(样本一二分别对应不同两天的神经信号数据集),QAGKRL和AGREL分类正确率下降后能迅速恢复到样本一测试结果的水平,而SVM下降到随机水平后无法恢复。在线脑控采用了在线脑机接口研究中的共享控制方法,引入共享控制参数来帮助猴子适应从手控到脑控的过渡过程,我们发现强化学习方法通过与外部环境的互适应可以获得比SVM方法更高的在线解码正确率,且QAGKRL优于AGREL,同时作为比较在我们切断这种互适应关系后,强化学习方法在线解码正确率降到平均水平以下且低于SVM方法。综上所述,本文在脑机接口相关研究背景下,利用已有资源成功搭建了在线实验平台并在平台上实现了解码模块,扩展了 SVM、AGREL、QAGKRL三种解码算法,先利用离线分析验证了算法和平台的有效性,再进行范式训练和在线实验,实现了猴子脑控光标球的系统功能。
[Abstract]:The brain-computer interface transforms the neural activity of the brain into a control signal that can be used to control the external equipment, and realizes the direct interaction between the brain and the external environment. The present research on neural decoding is mainly based on two categories of supervised learning and reinforcement learning. Compared with the supervised learning decoding model, which requires training data, The advantage of the decoding model based on reinforcement learning is that the decoding model does not need actual user body behavior data. (2) allows the user to control the external equipment dynamically by trial and error) the decoding model can adapt to the change of neuronal distribution mode. The brain is highly plastic, Changes in the environment will inevitably lead to changes in the pattern of neuronal distribution. The adaptive characteristic of reinforcement learning model plays an important role in the decoding stability of BCI. In this paper, two monkeys, B04 and B10, are used to explore the adaptive characteristics of reinforcement learning based on the classical center-out scaling experimental paradigm. In the center-out paradigm, the monkey controls the cursor ball to hit the target ball through a rocker to get the reward. The B04 neural data for offline analysis came from the bilateral primary motor cortex (M1) region of the brain, and the B10 neural data for the online experiment came from the bilateral dorsal anterior motor cortex (PMd) region. In this paper, we first implement the reinforcement learning method of artificial neural network based on back propagation of error (BP). The reinforcement learning method based on gated reinforcement learning and radial basis function neural network is quantized attention gated reinforcement learning QAGKRL. Compared with AGREL, it has the defect of falling into local minimum. This method can realize the global optimal solution of nonlinear neural decoding, and the topological structure of neural network can be compressed by quantization method to reduce the computational complexity. In off-line analysis, 10 days' data are compared and analyzed. On the whole, SVM is better than QAGKRL and QAGKRL is better than AGREL when classification is simple, but QAGKRL and AGREL have similar classification effect to supervised learning method without training and motion data. In the model of sample one, sample 2 (sample 1 and 2 respectively correspond to different neural signal data sets for two days) and the correct rate of QAGKRL and AGREL classification can quickly recover to the level of sample one test result after the accuracy of QAGKRL and AGREL is reduced. However, SVM can not be recovered when it drops to random level. On-line brain control adopts the shared control method in the research of online brain-computer interface, and the shared control parameters are introduced to help monkeys adapt to the transition process from manual control to brain control. We find that the reinforcement learning method can achieve a higher online decoding accuracy than the SVM method by adapting to the external environment, and that QAGKRL is superior to the AGREL method, and as a comparison, after we cut off this mutual adaptation relationship, The online decoding accuracy of reinforcement learning method is lower than the average level and lower than that of SVM method. The online experimental platform is successfully built using the existing resources and the decoding module is implemented on the platform. Three decoding algorithms are extended. The validity of the algorithm and platform is verified by off-line analysis, and then the normal form training and online experiment are carried out. The system function of monkey brain control cursor ball is realized.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN911.7;R318
【参考文献】
相关期刊论文 前1条
1 ;Development of an invasive brain machine interface with a monkey model[J];Chinese Science Bulletin;2012年16期
,本文编号:1599434
本文链接:https://www.wllwen.com/yixuelunwen/swyx/1599434.html