当前位置:主页 > 科技论文 > 机械论文 >

基于强化学习的劣化系统维修策略研究

发布时间:2018-05-22 11:41

  本文选题:劣化系统 + 维修策略 ; 参考:《合肥工业大学》2011年硕士论文


【摘要】:工业生产中,受到运行时间和所处环境的影响,生产系统的状态不断劣化,工作效率和性能都逐渐下降。当下降到无法满足工作要求时,即使系统还能工作,仍将其视为失效,系统失效会造成经济上的巨大损失。事前维修是指利用一种或一系列的维修作业,发现或排除某一隐蔽或潜在故障,使系统保持在良好的工作状态,避免系统失效,这对于减少生产成本以及工业生产有着重要的影响。因此,如何对生产系统的维修进行调配,避免系统在一个生产成本较高的状态下运行,以及对提高系统的可靠性和安全性是一个重要的研究课题。 论文以强化学习为基础,首先针对离散状态下的劣化系统维修问题,建立了连续时间的半马尔可夫决策过程(Semi-Markov Decision Process, SMDP)模型。为了避免结果陷入局部最优值,使用了Q学习与模拟退火(Simulated Annealing, SA)相结合的算法对该问题进行求解,得到系统较优的维修策略。通过仿真得出平均和折扣性能准则下的优化结果,并讨论了检测间隔时间对结果的影响。 同时,论文还考虑了部分可观的劣化系统,也即检测存在误差观测者不能完全确定系统的状态,而只能通过不完整的信息来对系统进行决策的情况,针对离散状态连续时间下的问题建立了部分可观半马尔可夫决策过程(Partially Observed Semi-Markov Decision Process,POSMDP)模型,利用了强化学习中的Sara (λ)学习算法以及NSM算法,分别从无记忆和基于记忆的角度来对问题进行求解,得到了在平均性能准则下的优化结果。同时对检测间隔与平均代价之间的影响进行了讨论,与完全可观下的结果一致。最后,论文还对NSM算法中参数k的取值进行了讨论,与实际情况相符合。
[Abstract]:Due to the influence of running time and environment, the state of the production system is deteriorating, and the efficiency and performance of the production system decrease gradually. Even if the system can still work, it can still be regarded as invalid, and the failure of the system will cause huge economic losses. Prior maintenance refers to the use of one or a series of maintenance operations to detect or eliminate a hidden or potential fault, so as to keep the system in good working condition and to avoid system failure. This has an important impact on reducing production costs and industrial production. Therefore, how to adjust the maintenance of the production system, avoid the system running under a high production cost, and improve the reliability and safety of the system is an important research topic. Based on reinforcement learning, a semi-Markov Decision Process, SMDP) model of continuous time semi-Markov decision process is established for the maintenance of degraded systems in discrete state. In order to avoid the result falling into the local optimal value, the problem is solved by the combination of Q learning and simulated annealing algorithm (SA), and the optimal maintenance strategy of the system is obtained. The optimization results under average and discounted performance criteria are obtained by simulation, and the effect of detection interval on the results is discussed. At the same time, the paper also considers some considerable deterioration systems, that is, the detection of error observers can not fully determine the state of the system, but only through incomplete information to make decisions on the system. A partial observable Observed Semi-Markov Decision process POSMDP model is established for discrete state continuous time problem. Sara (位) learning algorithm and NSM algorithm are used in reinforcement learning. The problem is solved from the point of view of memoryless and memoryless, and the optimization results under the average performance criterion are obtained. At the same time, the effect of detection interval and average cost is discussed, which is in agreement with the results obtained under completely observable conditions. Finally, the parameter k in NSM algorithm is discussed, which is in accordance with the actual situation.
【学位授予单位】:合肥工业大学
【学位级别】:硕士
【学位授予年份】:2011
【分类号】:TH17

【参考文献】

相关期刊论文 前9条

1 程志君;郭波;;多部件系统机会维修优化模型[J];工业工程;2007年05期

2 修国明;张积滨;潘启树;;基于实例的POMDP问题的近似求解[J];计算机工程与应用;2008年29期

3 王学宁,贺汉根,徐昕;求解部分可观测马氏决策过程的强化学习算法[J];控制与决策;2004年11期

4 曹希仁;;离散事件动态系统[J];自动化学报;1985年04期

5 程志君;郭波;;基于半Markov决策过程的劣化系统检测与维修优化模型[J];自动化学报;2007年10期

6 仵博;吴敏;;部分可观察马尔可夫决策过程研究进展[J];计算机工程与设计;2007年09期

7 赵喜林;许兴华;;现代制造系统维修策略研究[J];新技术新工艺;2006年01期

8 程志君;郭波;;连续劣化系统的最佳检测与维修策略分析[J];系统工程与电子技术;2008年01期

9 王文义,张洪芬;维修策略的概念、方法和模型(Ⅰ)[J];运筹与管理;1997年02期



本文编号:1921910

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jixiegongcheng/1921910.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户bbace***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com