基于近似动态规划的优化控制研究及在电力系统中的应用

发布时间：2018-07-13 13:32

【摘要】：基于近似动态规划(Approximate dynamic programming. ADP)的最优控制问题是近年来控制领域研究的热点之一。结合强化学习思想的近似动态规划是利用函数近似结构来逼近动态规划方程中的代价函数和控制策略,以满足最优性原理,从而获得最优代价函数和最优控制策略。因而,近似动态规划成功避免了动态规划求解最优控制存在的“维数灾”问题而获得广泛的关注。但是,近似动态规划理论及其算法还没有得到完善,利用ADP研究动态系统最优控制的许多理论与技术问题还有待解决。为此,在国家自然科学基金项目“智能电网的动态全局优化与节能控制理论及其应用(50977008)”等的资助下,本文基于近似动态规划理论进一步研究动态系统若干优化控制问题,提出适合不同情形的迭代ADP算法。并将ADP方法应用到电力系统中,扩展了ADP方法的应用范围。本文主要工作和贡献如下：1.针对未知连续线性系统的最优跟踪控制问题,提出了一种新型的基于ADP的最优跟踪控制方案。首先,将原系统的最优跟踪问题转化成一个增广系统的最优调节控制问题。并证明了增广系统的最优控制解等价于原系统的最优跟踪控制问题的标准解。然后,给出了一种新的在线ADP算法来在线求解增广代数Riccati方程,实现了在线求得未知系统的最优跟踪控制器。2.提出了一种基于ADP的自适应最优控制方案,有效解决了一类离散仿射非线性系统的最优控制问题。首先,利用两个神经网络作为在线参数结构来分别近似代价函数和最优控制律,分别被称为评价网络和执行网络。在考虑神经网络近似误差的基础上,通过Lyapunov理论,证明了系统状态和神经网络权值估计误差都是一致最终有界性,并且能够保证所获得的控制输入在最优控制输入的一个小的邻域内。3.针对一类带有外界扰动的离散非线性系统的H∞控制问题,提出了一个新的在线自适应策略学习方案。利用三个神经网络作为在线参数结构分别设计了评价网络、执行网络和扰动网络,并给出网络权值的在线更新律。在考虑神经网络近似误差的基础上,通过Lyapunov理论,证明了系统状态和所有的网络权值估计误差都是一致最终有界性,并且能够保证所获得的控制输入在最优控制输入的一个小的邻域内。4.提出了一种新的迭代两级DHP算法,解决了一类带有饱和执行器的非线性切换系统的最优控制问题。利用一个非二次型泛函解决了执行饱和约束问题,保证了控制函数在饱和执行器内是一个光滑函数,推导出一种新异的迭代两级DHP算法用来求解约束HJB方程。给出严格的数学证明保证了所提出迭代两级DHP算法的收敛性。5.针对一类离散非线性切换系统的最优跟踪控制问题,设计了一种迭代ADP算法来获取最优跟踪混合控制策略。首先,将最优跟踪控制问题转化为一个误差切换系统的最优调节控制问题。其次,给出了一种新的迭代两级ADP算法来求解误差系统的HJB方程。最后给出算法的收敛性分析,保证了得到跟踪混合控制策略是最优的。6.设计了一种迭代两级ε-ADP算法,其有效地解决了一类离散非线性切换系统的有限时间最优控制问题。首先,给出了迭代两级ADP算法来求解HJB方程,并给出了迭代算法的严格的收敛性分析。接着,给出了ε-最优控制策略,使得迭代两级ADP算法能够在有限步得到在ε误差边界内接近最优值的近似最优代价函数,从而实现了离散非线性切换系统的有限时间最优控制。7.针对未知电力系统的负荷频率控制问题,提出了一个基于ADP的在线H∞鲁棒负荷频率控制器设计方案。首先利用H∞控制方法来处理系统的不确定性问题。然后,利用二人零和微分对策理论来解决H∞控制问题,并通过利用ADP技术和克罗内克积理论,给出了一个基于数据的在线ADP算法,该算法通过利用系统状态和控制输入的在线信息学习博弈代数Riccati方程的解,从而实现了解决完全未知电力系统的负荷频率控制问题。
[Abstract]:The optimal control problem based on the approximate dynamic programming (Approximate dynamic programming. ADP) is one of the hot topics in the field of control field in recent years. The approximate dynamic programming combining the thought of reinforcement learning is used to approximate the cost function and control strategy in the dynamic programming equation by using the approximate structure of the function to satisfy the optimality principle. The optimal cost function and the optimal control strategy. Therefore, the approximate dynamic programming successfully avoids the "dimensionality disaster" problem of the dynamic programming to solve the optimal control. However, the approximate dynamic programming theory and its algorithm have not been improved, and many theoretical and technical problems of the optimal control of dynamic systems are studied by ADP. For this reason, under the support of the National Natural Science Fund Project "dynamic global optimization and energy saving control theory and its application (50977008)" of the National Natural Science Foundation, this paper further studies some optimization control problems of dynamic system based on the approximate dynamic programming theory, and puts forward the iterative ADP algorithm suitable for different situations. And the ADP party is put forward. The application of the method to the power system extends the application scope of the ADP method. The main work and contributions of this paper are as follows: 1. a new optimal tracking control scheme based on ADP is proposed for the optimal tracking control problem of an unknown continuous linear system. First, the optimal tracking problem of the original system is transformed into an optimal tune of an augmented system. It is proved that the optimal control solution of the augmented system is equivalent to the standard solution of the optimal tracking control problem of the original system. Then, a new online ADP algorithm is given to solve the augmented algebraic Riccati equation online, and the optimal tracking controller.2. on line for the unknown system is realized, and a ADP based adaptive optimization is proposed. The optimal control scheme effectively solves the optimal control problem for a class of discrete affine nonlinear systems. First, two neural networks are used as online parameter structures to approximate the cost functions and optimal control laws respectively, which are called the evaluation network and the executive network respectively. On the basis of considering the neural network approximation error, the Lyapunov theory is adopted. It is proved that the system state and the weight estimation error of the neural network are all consistent and ultimate boundedness, and can guarantee the control input of the obtained control input in a small neighborhood of the optimal control input.3. for a class of H infinity control problems of a class of discrete nonlinear systems with external disturbances, and a new online adaptive strategy learner is proposed. Three neural networks are used as online parameter structures to design evaluation network, execute network and disturbance network, and give an online update law of network weight value. Based on the approximate error of neural network, it is proved that the system state and the estimation error of all network weight values are all consistent and ultimate boundedness on the basis of Lyapunov theory. And can ensure that the obtained control input is in a small neighborhood of the optimal control input.4., a new iterative two level DHP algorithm is proposed to solve the optimal control problem of a class of nonlinear switched systems with a saturated actuator. A non two order functional solution is used to execute the saturation constraint problem, and the control function is guaranteed. The number in the saturated actuator is a smooth function, and a new different iterative two level DHP algorithm is derived to solve the constrained HJB equation. The strict mathematical proof guarantees the convergence of the proposed iterative two DHP algorithm for the optimal tracking control problem of a class of discrete nonlinear switched systems, and an iterative ADP algorithm is designed. The optimal tracking hybrid control strategy is obtained. First, the optimal tracking control problem is transformed into an optimal control problem of an error switching system. Secondly, a new iterative two level ADP algorithm is given to solve the HJB equation of the error system. Finally, the convergence analysis of the algorithm is given to ensure that the tracking hybrid control strategy is obtained. The optimal.6. designs an iterative two stage epsilon -ADP algorithm, which effectively solves the finite time optimal control problem of a class of discrete nonlinear switched systems. First, an iterative two level ADP algorithm is given to solve the HJB equation, and the strict convergence analysis of the iterative algorithm is given. Then, the optimal control strategy is given, which makes the iteration of the iterative algorithm. The two level ADP algorithm can obtain an approximate optimal cost function which is close to the optimal value in the boundary of the epsilon error, thus realizing the load frequency control problem of the finite time optimal control.7. for the unknown power system by the finite time optimal control of the discrete nonlinear switched system, and proposes a design side of the online H robust load frequency controller based on the ADP. First, the H infinity control method is used to deal with the uncertainty of the system. Then, the two person zero sum differential game theory is used to solve the H infinity control problem. A data based online ADP algorithm is presented by using the ADP technology and Kronecker's product theory. The algorithm passes the online informatics of the system state and control input. The solution of algebraic Riccati equation is achieved, so that the load frequency control problem of fully unknown power system can be realized.
【学位授予单位】：东北大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：O221;TM711

【相似文献】