基于强化学习的库位优化算法在物料拉动系统中的研究与应用
发布时间:2018-11-16 06:51
【摘要】:机械自动化和流水线技术的兴起推动了现代制造业的蓬勃发展。为了在激烈的竞争环境中占据有利地位,相关企业积极寻求控制生产成本和提升生产效率的途径。自动化立体仓库的库位分配作为制造业生产物流中的重要组成部分,对生产线的运行效率和能量消耗都有着显著的影响。优秀的库位分配策略可以有效的减少生产物流中的时间消耗和能量消耗,提高生产系统的运行效率。本文结合国内某汽车整车制造厂的自动化立体仓库情况,针对当前仓库管理中存在的出入库效率不高、运作能量消耗较大、分配智能化程度较低等问题提出了优化需求。根据建立的库位分配优化模型的大规模条件信息、离散输入输出和全局优化的特点,在分析不同解决方案特点的基础上,研究使用了一种基于环境抽象和时态抽象的强化学习算法来解决该问题。针对库位分配问题存在的条件规模较大的特点,对环境信息进行去冗余和抽象分层,将具体的库位信息抽象成细节无关的分类评价信息。减小了问题的输入条件规模,提升了问题的计算速度和收敛速度。针对该问题为全局最优化问题的特点,结合半马尔科夫过程SMDP的思想,对模型的决策过程进行时态抽象,将实时评价延迟为周期评价。通过周期的统计计算结果的优劣来调整模型的决策方向,避免因模型追求实时分配效果而导致整体分配效果不佳的情况。针对库位分配问题训练样本不足和储存空间有限的情况,使用BP神经网络对模型值函数进行近似模拟,并根据最优历史分配周期评价结果和当前分配周期评价结果对其进行训练。避免了因采用查表法计算值函数而造成的存储空间需求巨大、训练周期长和对样本要求高等问题。最后,以研究内容为基础构建了库位分配系统,阐述了系统的主要设计过程和实现过程,展示了其对于汽车制造厂生产物流的优化效果。
[Abstract]:The rise of mechanical automation and assembly line technology promotes the vigorous development of modern manufacturing industry. In order to occupy a favorable position in the fierce competitive environment, relevant enterprises actively seek ways to control production costs and improve production efficiency. As an important part of manufacturing production logistics, storage allocation of automated warehouse has a significant impact on the efficiency and energy consumption of production line. Excellent allocation strategy can effectively reduce the time and energy consumption in production logistics and improve the efficiency of production system. In this paper, according to the situation of the automatic three-dimensional warehouse in a domestic automobile manufacturing plant, the optimization requirements are put forward in view of the problems existing in the current warehouse management, such as the inefficiency of entering and leaving warehouse, the large consumption of operation energy, and the low degree of intelligent distribution, etc. According to the large-scale conditional information, discrete input and output and global optimization of the established optimization model, the characteristics of different solutions are analyzed. A reinforcement learning algorithm based on environment abstraction and temporal abstraction is used to solve this problem. In view of the large scale of the condition for the allocation of database bits, the environmental information is de-redundant and abstractly stratified, and the specific information is abstracted into detailed classification and evaluation information. The input condition scale of the problem is reduced, and the computation speed and convergence speed of the problem are improved. In view of the fact that the problem is a global optimization problem, combined with the idea of semi-Markov process (SMDP), the decision-making process of the model is abstracted in temporal state, and the real-time evaluation is delayed as periodic evaluation. The decision direction of the model is adjusted by the result of the statistical calculation of the period to avoid the situation that the overall allocation effect is not good because the model pursues the real-time allocation effect. In view of the shortage of training samples and the limited storage space in the allocation of database, the BP neural network is used to approximate simulate the model value function. It is trained according to the evaluation results of the optimal historical allocation cycle and the current distribution cycle. It avoids the problems of huge storage space, long training period and high demand for samples, which are caused by the use of look-up table method to calculate the value function. Finally, based on the research content, the paper constructs a warehouse allocation system, expounds the main design process and realization process of the system, and shows the optimization effect of the system for the production logistics of the automobile factory.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:U468.8
本文编号:2334777
[Abstract]:The rise of mechanical automation and assembly line technology promotes the vigorous development of modern manufacturing industry. In order to occupy a favorable position in the fierce competitive environment, relevant enterprises actively seek ways to control production costs and improve production efficiency. As an important part of manufacturing production logistics, storage allocation of automated warehouse has a significant impact on the efficiency and energy consumption of production line. Excellent allocation strategy can effectively reduce the time and energy consumption in production logistics and improve the efficiency of production system. In this paper, according to the situation of the automatic three-dimensional warehouse in a domestic automobile manufacturing plant, the optimization requirements are put forward in view of the problems existing in the current warehouse management, such as the inefficiency of entering and leaving warehouse, the large consumption of operation energy, and the low degree of intelligent distribution, etc. According to the large-scale conditional information, discrete input and output and global optimization of the established optimization model, the characteristics of different solutions are analyzed. A reinforcement learning algorithm based on environment abstraction and temporal abstraction is used to solve this problem. In view of the large scale of the condition for the allocation of database bits, the environmental information is de-redundant and abstractly stratified, and the specific information is abstracted into detailed classification and evaluation information. The input condition scale of the problem is reduced, and the computation speed and convergence speed of the problem are improved. In view of the fact that the problem is a global optimization problem, combined with the idea of semi-Markov process (SMDP), the decision-making process of the model is abstracted in temporal state, and the real-time evaluation is delayed as periodic evaluation. The decision direction of the model is adjusted by the result of the statistical calculation of the period to avoid the situation that the overall allocation effect is not good because the model pursues the real-time allocation effect. In view of the shortage of training samples and the limited storage space in the allocation of database, the BP neural network is used to approximate simulate the model value function. It is trained according to the evaluation results of the optimal historical allocation cycle and the current distribution cycle. It avoids the problems of huge storage space, long training period and high demand for samples, which are caused by the use of look-up table method to calculate the value function. Finally, based on the research content, the paper constructs a warehouse allocation system, expounds the main design process and realization process of the system, and shows the optimization effect of the system for the production logistics of the automobile factory.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:U468.8
【参考文献】
相关期刊论文 前8条
1 刘金平,周炳海,奚立峰;在线自动化立体仓库的库位分配方法及其实证研究[J];工业工程与管理;2005年01期
2 陆鑫,高阳,李宁,陈世福;基于神经网络的强化学习算法研究[J];计算机研究与发展;2002年08期
3 商允伟,裘聿皇,刘长有;自动化仓库货位分配优化问题研究[J];计算机工程与应用;2004年26期
4 陈月婷;何芳;;基于改进粒子群算法的立体仓库货位分配优化[J];计算机工程与应用;2008年11期
5 尤艺;周立新;孙焰;;宝钢矿石料场库位分配优化数模研究[J];物流科技;2009年11期
6 刘智斌;曾晓勤;刘惠义;储荣;;基于BP神经网络的双层启发式强化学习方法[J];计算机研究与发展;2015年03期
7 马永杰;蒋兆远;杨志民;;基于遗传算法的自动化仓库的动态货位分配[J];西南交通大学学报;2008年03期
8 简淦杨;刘明波;林舜江;;随机动态经济调度问题的存储器建模及近似动态规划算法[J];中国电机工程学报;2014年25期
相关硕士学位论文 前2条
1 张欢欢;自动化立体仓库的若干关键技术与仿真[D];浙江大学;2008年
2 何玉林;瞬时差分方法在中国象棋计算机博弈中的应用[D];河北大学;2009年
,本文编号:2334777
本文链接:https://www.wllwen.com/guanlilunwen/wuliuguanlilunwen/2334777.html