基于状态聚类的非参数化近似广义策略迭代增强学习算法
发布时间:2019-04-18 08:59
【摘要】:为解决当前近似策略迭代增强学习算法普遍存在计算量大、基函数不能完全自动构建的问题,提出一种基于状态聚类的非参数化近似广义策略迭代增强学习算法(NPAGPI-SC).该算法利用二级随机采样过程采集样本,利用trial-and-error过程和以样本完全覆盖为目标的估计方法计算逼近器初始参数,利用delta规则和最近邻思想在学习过程中自适应地调整逼近器,利用贪心策略选择应执行的动作.一级倒立摆平衡控制的仿真实验结果验证了所提出算法的有效性和鲁棒性.
[Abstract]:In order to solve the problem that the current approximate strategy iterative reinforcement learning algorithm has a large amount of computation and the basis function can not be constructed automatically, a nonparametric approximate generalized strategy iterative reinforcement learning algorithm (NPAGPI-SC) based on state clustering is proposed. In this algorithm, the two-stage random sampling process is used to collect samples, and the initial parameters of the approximator are calculated by using the trial-and-error process and the estimation method with the complete coverage of the sample as the target. The delta rule and the nearest neighbor idea are used to adjust the approximator adaptively in the learning process, and the greedy strategy is used to select the actions to be performed. The simulation results of the balance control of a single inverted pendulum verify the effectiveness and robustness of the proposed algorithm.
【作者单位】: 南昌大学江西省机器人与焊接自动化重点实验室;
【基金】:国家863计划项目(SS2013AA041003)
【分类号】:TP181
,
本文编号:2459917
[Abstract]:In order to solve the problem that the current approximate strategy iterative reinforcement learning algorithm has a large amount of computation and the basis function can not be constructed automatically, a nonparametric approximate generalized strategy iterative reinforcement learning algorithm (NPAGPI-SC) based on state clustering is proposed. In this algorithm, the two-stage random sampling process is used to collect samples, and the initial parameters of the approximator are calculated by using the trial-and-error process and the estimation method with the complete coverage of the sample as the target. The delta rule and the nearest neighbor idea are used to adjust the approximator adaptively in the learning process, and the greedy strategy is used to select the actions to be performed. The simulation results of the balance control of a single inverted pendulum verify the effectiveness and robustness of the proposed algorithm.
【作者单位】: 南昌大学江西省机器人与焊接自动化重点实验室;
【基金】:国家863计划项目(SS2013AA041003)
【分类号】:TP181
,
本文编号:2459917
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2459917.html