基于凸多面体抽象域的自适应强化学习技术研究
[Abstract]:Table-driven algorithm is an important method to solve reinforcement learning problem. However, due to the existence of "dimensionality disaster", this method can not be directly applied to solve reinforcement learning problem with continuous state space. There are two methods to solve the problem of dimensionality disaster: discretization of state space and approximation of function. Compared with the function approximation, the table-driven method based on continuous state space discretization has the advantages of intuitive principle, simple program structure and lightweight calculation. The key of discretization method based on continuous state space is to find appropriate discretization mechanism of state space, balance computation and accuracy, and ensure numerical measures based on discrete abstract state space, such as V value function and Q value function. It is possible to evaluate the original reinforcement learning problem and calculate the optimal strategy 蟺 * accurately. In this paper, an adaptive state space discretization method based on convex polyhedron abstract domain is proposed. The adaptive Q (位) reinforcement learning algorithm (Adaptive Polyhedra Domain based Q (位), APDQ (位). Based on convex polyhedron abstract domain is implemented. Convex polyhedron is an abstract state representation method, which is widely used to evaluate the performance of random systems and verify the numerical properties of programs. The mapping of concrete state space to the abstract state space of polyhedron domain is established by abstract function. The computation problem of continuous state space optimal strategy is transformed into a finite size and easy to deal with the computation problem of abstract state space optimal policy. According to the sample set information related to abstract state, several adaptive refinement mechanisms including BoxRefinement,LFRefinement and MVLFRefinement are designed. According to these refinement mechanisms, the abstract state space is continuously refined adaptively to optimize the discretization mechanism of the specific state space, and to produce a statistical reward model consistent with the sample space of online sampling. The algorithm APDQ (位) is realized based on the polyhedron professional computing library PPL (Parma Polyhedra Library) and the high precision numerical calculation library GMP (GNU Multiple Precision), and a case study is carried out. The typical continuous state space reinforcement learning problem (Mountain Car,MC) and acrobatics robot (Acrobatic robot,Acrobot) were selected as experimental objects. The effects of various reinforcement learning parameters and threshold parameters related to adaptive refinement on the performance of APDQ (位) are evaluated in detail, and the mechanism of various parameters in the process of policy optimization under the dynamic change of abstract state space is explored. The experimental results show that when the discount rate 纬 is greater than 0.7, the algorithm shows good comprehensive performance. In the initial stage, the strategy is improved quickly, and the later stage converges gently (as shown in figs. 6 ~ 13). And it has good adaptability to learning rate 伪 and various abstract state space refinement parameters. When the discount rate 纬 is less than 0.6, the performance of the algorithm declines rapidly. Abstract interpretation technology used in statistical learning process is a good idea to solve the continuous reinforcement learning problem. There are many problems worthy of further study and discussion, such as sampling based on approximate model and value function updating and so on.
【作者单位】: 苏州大学计算机科学与技术学院;符号计算与知识工程教育部重点实验室(吉林大学);
【基金】:国家自然科学基金项目(61272005,61303108,61373094,61472262,61502323,61502329) 江苏省自然科学基金项目(BK2012616) 江苏省高校自然科学研究项目(13KJB520020) 吉林大学符号计算与知识工程教育部重点实验室项目(93K172014K04) 苏州市应用基础研究计划项目(SYG201422) 苏州大学高校省级重点实验室基金项目(KJS1524) 中国国家留学基金项目(201606920013) 浙江省自然科学基金(LY16F010019)资助~~
【分类号】:TP181
【相似文献】
相关期刊论文 前10条
1 朱向阳,丁汉,熊有伦;凸多面体之间的伪最小平移距离——Ⅰ.定义及其性质[J];中国科学E辑:技术科学;2001年02期
2 周水生,容晓锋,周利华;计算两个凸多面体间距离的一个新算法[J];苏州科技学院学报;2003年02期
3 许如初,宋恩民,董向锋;求包含三维空间中给定点集最小凸多面体算法研究[J];武汉交通科技大学学报;1997年02期
4 费燕琼,赵锡芳;基于凸多面体边界元的接触状态判断[J];机械工程学报;2005年01期
5 耿志勇,黄琳;多输入多输出系统在凸多面体摄动模式下H_∞鲁棒性能[J];控制理论与应用;2000年05期
6 周水生,容晓锋,周利华;判断两个凸多面体相交的简单方法[J];宝鸡文理学院学报(自然科学版);2002年01期
7 王建平,冯光涛,赵锡芳;机器人装配中的几何不确定性建模[J];上海交通大学学报;2001年12期
8 吴海霞;冯伟;邹晓兵;;基于凸多面体方法的时滞和连续系统稳定性分析[J];计算机应用研究;2014年05期
9 耿魁,高洪华,崔丹,任世军;用神经网络求解空间中两凸多面体间最短距离[J];黑龙江水专学报;2000年01期
10 任世军,hope.hit.edu.cn,洪炳熔,孟庆鑫;判断两个凸多面体是否相交的一个快速算法[J];软件学报;2000年04期
相关会议论文 前3条
1 楚天广;黄琳;;凸多面体系统族的鲁棒正不变集-混合单调方法[A];1996年中国控制会议论文集[C];1996年
2 蒋卫华;黄琳;楚天广;;离散凸多面体系统族的鲁棒正不变集——混合单调方法[A];1997年中国控制会议论文集[C];1997年
3 郭祥贵;王武;杨富文;陈四雄;;凸多面体不确定系统的鲁棒L_2-L_∞控制[A];2007年中国智能自动化会议论文集[C];2007年
相关博士学位论文 前2条
1 张彦虎;线性凸多面体不确定离散系统的分析与综合[D];浙江大学;2006年
2 衷路生;状态空间模型辨识方法研究[D];中南大学;2011年
相关硕士学位论文 前4条
1 郭晓宝;凸多面体不确定时滞系统均方指数稳定性的研究[D];合肥工业大学;2012年
2 胡军;凸多面体不确定离散线性系统的鲁棒性分析[D];哈尔滨理工大学;2009年
3 伊骞鹤;基于凸多面体模型的网络控制系统设计[D];哈尔滨工业大学;2010年
4 周e鴈,
本文编号:2393244
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2393244.html