鸽子视觉—行为抉择的强化学习研究
发布时间:2019-04-08 15:28
【摘要】:行为抉择(认知执行)是人类与动物等智体(agent)在自然界优胜劣汰下得以生存而必备的技能,通过对外界信息的判断而指导其行为做出抉择。智体获取外界信息的主要来源是视觉,占据所有感知信息的80%以上。在自然界中,智体赖以生存的视觉-行为抉择大部分是后天学习(强化学习)得来。鸽子因其强大的视觉感知能力和不亚于哺乳动物的行为抉择能力,成为视觉认知领域的典型模式动物。因此开展鸽子视觉-行为抉择的强化学习研究,对于揭示智体在行为抉择中的认知机制具有重要意义,有助于理解智能抉择行为的脑机制,深化对大脑认知抉择工作原理的认识。关于鸽子视觉-行为抉择的研究虽已取得一些进展,但多侧重于静态规则下的强化学习研究,实验范式过于简化,多采用固定不变的学习率或单一的奖励矩阵,并不能真正的模拟智体在动态环境规则下的行为抉择机制。此外,NCL区(nidopallium caudolaterale)神经元在强化学习过程中所起的作用尚不明确。为此本文以鸽子为实验对象,设计了动态强化规则的视觉-行为抉择实验范式,开展行为训练,同步采集了鸽子NCL区神经元电信号,从行为学和神经元响应角度分析了鸽子在动态强化学习过程中的行为抉择特性和NCL区神经元的响应特性。本文主要开展的工作如下:(1)设计了两种动态规则下的视觉-行为抉择训练范式。设计了随机强化和反转强化两种视觉-行为抉择实验范式;根据拟定的实验流程搭建了行为训练的硬件与软件平台,实现了鸽子基于特定奖惩信息的自动化训练;同步采集了强化学习训练过程中鸽子NCL区神经元电信号,完成了神经元电信号的预处理。(2)提出了一种新的动态强化学习模型。通过对经典Q-Learning模型的学习率和奖励矩阵进行改进,提出一种新的动态强化学习模型,对鸽子在两种训练过程中的行为反馈数据进行分析,并与经典Q-Learning模型对比,结果表明采用动态强化学习模型预测行为的误差分别降低了46.98%与30.55%,同时发现该模型的学习率反映了鸽子在不同训练阶段的内部学习状态。(3)提取了不同训练阶段鸽子NCL区神经元的响应特征,并做了统计分析。通过筛选有效试次响应信号,选取合适的响应时间窗,计算了特定时间窗内的放电频率,作为神经元响应特征;采用曼惠特尼检验分析了鸽子在强化学习过程中NCL区神经元响应特征差异显著性。结果表明,部分(10/60)神经元的响应特征反映了训练中的奖惩信息;部分(21/60)神经元的响应特征包含了鸽子学习状态的信息。该结果说明NCL区的神经元在强化学习过程中扮演了不同的角色。
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q42
本文编号:2454693
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q42
【相似文献】
相关硕士学位论文 前5条
1 陶梦妍;鸽子视觉—行为抉择的强化学习研究[D];郑州大学;2017年
2 陈雪美;鸽子海马区位置细胞识别及位置野分布特性分析[D];郑州大学;2017年
3 李珊;锋电位功能网络构建与鸽子转向行为解码[D];郑州大学;2017年
4 杨松领;鸽子迷宫训练系统的设计与实现[D];郑州大学;2017年
5 陈艳;基于同步似然的gamma子带功能网络构建与鸽子转向行为解码[D];郑州大学;2017年
,本文编号:2454693
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/2454693.html