基于强化学习方法的多成品率衰变生产系统维护策略研究

发布时间：2018-12-18 02:21

【摘要】：在制造业系统中,设备的状态会由于疲劳、磨损、老化等原因发生衰变。运行状态衰变的设备会导致产品质量的下降以及生产成本的提高。维护行动如检测、修理或更换可以防止设备在较差的状态下运转。可是,过度的维护又会造成生产的中断、增加设备停机时间和系统维护成本。因此,制定合理的设备维护策略对制造业系统非常重要。尽管国内外的许多学者已经从多个角度对生产系统中的设备维护问题进行了大量的研究,然而,与产品质量管理相关的主题却很少在文献中提及。在现实生产系统中,设备的状态往往会影响其产品质量水平,存在多成品率质量问题,即设备随着其状态的恶化会以较高的概率生产次品。因此,可以依据产品质量检测数据对设备状态进行推断,确定最优维护策略。近年来,流水线系统设备维护策略的研究吸引了学者们越来越多的关注,尤其是由上、下游两台串行设备和一个中间库存缓冲区组成的两设备流水线系统,简称2M1B系统。然而,大部分研究工作都是基于较强的假设条件,例如,生产时间和维护时间是单位时间,维护资源充足并随时可以获取等。依据上述假设条件进行的维护决策缺乏现实依据。因此,本文以具有多成品率质量问题单台衰变设备的预防维护策略研究为基础,尝试分析2M1B流水线系统中衰变设备的预防维护策略,并进一步探讨有限的维护资源对预防维护策略的影响。最后,改进研究中使用的模型求解方法。主要研究内容和成果如下： (1)针对具有多成品率质量问题的单台衰变设备提出一种预测维护方法,主要通过两个阶段实现。首先,利用一个连续时间、离散状态半马尔科夫模型描述设备的衰变过程,采用基于策略迭代的强化学习方法求解该模型并获得基于设备观测状态的维护策略。之后,应用学习到的维护策略重新仿真系统模型估计未来的维护时间。通过算例分析发现,设备未来的维护时间随着生产产品总数的增加而下降,同时在给定生产产品总数的前提下也会随着次品数的增加而下降。而且,不断增加的维护次数也会引发维护时间的提前。 (2)在单台衰变设备维护策略研究的基础上,分析2M1B流水线系统中衰变设备的维护策略。建立两Agent半马尔科夫决策过程模型描述系统中设备的衰变过程。提出一种分布式多Agent强化学习方法,即costs-sharing-RL方法求解该模型。以最小化系统长期期望平均成本率为目标,考虑每个智能体所做的局部决策与全局最优目标之间的联系,获取系统最优的维护策略。 (3)进一步,思考在维护资源有限情况下2M1B流水线系统中衰变设备的维护策略。假设有限的维护资源导致设备的不完美维护,建立连续时间、离散状态半马尔科夫模型描述设备的衰变过程。采用基于资源受限的分布式多Agent强化学习方法,即RC-costs-sharing-RL方法求解该模型。通过2M1B流水线系统的数值实例证明RC-costs-sharing-RL方法优于其他两种方法如sequential PM方法及independent-RL方法,并可以获得系统最优的维护策略。 (4)从实际应用的角度出发,以2M1B流水线系统衰变设备维护问题为背景,提出一种启发式加速的多Agent强化学习方法,即HAMSL方法。目标是在最小化系统平均成本率的前提下,利用启发式函数提高多Agent强化学习方法的学习效率。实验结果表明提出的HAMSL方法的学习效率要优于一些基于传统启发式搜索技术的强化学习方法,如ε-贪婪多Agent强化学习方法、邻域搜索多Agent强化学习方法、模拟退火搜索多Agent强化学习方法及禁忌搜索多Agent强化学习方法。
[Abstract]:In the manufacturing system, the state of the equipment will decay due to fatigue, wear, aging, etc. a device operating in a state of decay may result in a decrease in product quality and an increase in the production cost. maintenance actions such as detection, repair or replacement can prevent the device from operating in a poor state. However, excessive maintenance can cause interruption of production, increase equipment downtime and system maintenance costs. Therefore, the development of reasonable equipment maintenance strategy is very important to the manufacturing system. Although many scholars at home and abroad have done a lot of research on the maintenance of equipment in the production system from various angles, the subject matter related to product quality management is seldom mentioned in the literature. In the real production system, the state of the equipment tends to affect the product quality level, and there is a problem of multi-yield quality, that is, the equipment can produce defective products with higher probability with the deterioration of its state. Therefore, the device state can be inferred according to the product quality detection data, and the optimal maintenance strategy can be determined. In recent years, the research of the maintenance strategy of the pipeline system has attracted more and more attention from the scholars, especially the two-device pipeline system composed of the upper and the downstream serial devices and an intermediate stock buffer, which is called the 2M1B system. Series. However, most of the research work is based on strong assumptions, for example, production time and maintenance time is unit time, maintenance resources are adequate and can be obtained at any time and the like. The maintenance decision based on the above-mentioned hypothesis is lack of reality. Therefore, based on the research of the prevention and maintenance strategy of single-stage decays with a multi-yield quality problem, this paper tries to analyze the preventive maintenance strategy of the decay equipment in the 2M1B pipeline system, and further discusses the reflection of the limited maintenance resources on the prevention and maintenance strategy. In the end, the model solver used in the study is improved The main content and results of the study, for example (1) A method for predicting and maintaining a single-stage decay device with a multi-yield quality problem is proposed, The method comprises the following steps of: firstly, using a continuous time, a discrete state semi-Markov model to describe the decay process of the equipment, solving the model by adopting a strengthened learning method based on a policy iteration and obtaining a dimension based on the observation state of the equipment, After that, apply the learned maintenance policy to re-simulate the system model to estimate the future dimension It is found that the maintenance time of the equipment will decrease with the increase of the total number of production products, and the number of defective products will increase with the increase of the number of defective products. and the increasing number of maintenance times will also lead to maintenance time (2) Analysis of the decay equipment in the 2M1B pipeline system on the basis of the study of the maintenance strategy of the single-stage decay equipment The maintenance strategy of the two-agent semi-Markov decision-making process model is established. A distributed multi-agent enhanced learning method, that is, a costs-sharing-RL method, is presented. The model is solved. In order to minimize the system's long-term expected average cost rate as the target, consider the relationship between the local decision-making and the global optimal target, and get the optimal system. maintenance strategy. (3) Further, consider the decay of the 2M1B pipeline system under limited resources Equipment maintenance strategy. It is assumed that limited maintenance resources result in imperfect maintenance of equipment, establishment of continuous time, discrete state semi-Markov model description Decay process of equipment. A distributed multi-agent-based enhanced learning method based on resource-constrained is used, that is, RC-costs-sharing-RL By means of the numerical example of the 2M1B pipeline system, the RC-costs-sharing-RL method is proved to be superior to the other two methods, such as the sequential PM method and the inependent-RL method, and the system can be obtained. The optimal maintenance strategy. (4) Based on the practical application, this paper presents a heuristic acceleration multi-agent enhanced learning method based on the maintenance of the 2M1B pipeline system. The goal is to improve the multi-agent strengthening with the heuristic function on the premise of minimizing the average cost rate of the system. The experimental results show that the learning efficiency of the HMSL method is better than that of some traditional heuristic search techniques, such as the one-greedy multi-agent enhanced learning method, the neighborhood search multi-Ag, The Method of Reinforcement Learning, Simulated Annealing, and Multi-Agent Reinforcement Learning and the Tabu-search of Multi-Ag
【学位授予单位】：华中科技大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TP18;TH17

【参考文献】