当前位置:主页 > 科技论文 > 交通工程论文 >

面向排队长度管理的单交叉口在线Q学习控制模型

发布时间:2018-03-02 04:28

  本文关键词: 交通工程 信号控制交叉口 定周期Q学习配时 不定周期Q学习配时 出处:《长沙理工大学》2014年硕士论文 论文类型:学位论文


【摘要】:为了优化交叉口信号配时,本文通过建立Excel Vba-Vissim-Matlab集成仿真平台,建立了以总关键排队长度之差最小为优化目标的单交叉口在线Q学习模型。在线模型分为定周期Q学习配时模型、不定周期Q学习配时模型。针对控制性能指标相对于临近的配时方案不敏感的特点,提出了以平均总关键排队长度之差作为基本单位重新构造奖励函数,目的是拉大各行为对应的Q值差距,提高模型的收敛速度和鲁棒性。定周期两相位Q学习模型算例表明Q学习模型的正确性,能够随着流量变化动态优化,而且利用经验可以缩短学习时间。通过对猴子石大桥交通状况的模拟测试,表明模型具有很好的实际应用能力。通过定周期Q学习配时方案、不定周期Q学习配时方案与Transyt配时方案的对比,结果表明以总关键排队长度之差作为优化目标能够优化整个交叉口的时空资源,本论文建立的在线Q学习模型具有较高的准确性、鲁棒性和学习能力,通过学习能够实现优化目标。同时还探讨了流量变化情况下定周期、不定周期Q学习配时模型的性能。
[Abstract]:In order to optimize intersection signal timing, this paper establishes an online Q learning model of single intersection with minimum critical queue length difference as the optimization objective by establishing Excel Vba-Vissim-Matlab integrated simulation platform. The online model is divided into fixed period Q learning timing model. According to the insensitivity of the control performance index to the adjacent timing scheme, a reward function based on the difference of the average total critical queue length as the basic unit is proposed. The purpose of this paper is to widen the Q-value gap corresponding to different behaviors and to improve the convergence speed and robustness of the model. An example of the two-phase Q-learning model with fixed period shows that the Q-learning model is correct and can be dynamically optimized with the flow rate. Moreover, the learning time can be shortened by using experience. By simulating the traffic conditions of the Monkey Stone Bridge, it is shown that the model has good practical application ability. The comparison between the uncertain periodic Q learning timing scheme and the Transyt timing scheme shows that the space-time resources of the intersection can be optimized by using the difference of the total critical queue length as the optimization objective. The online Q learning model established in this paper has high accuracy, robustness and learning ability, and it can achieve the optimization goal by learning. At the same time, the performance of the Q learning timing model with fixed period and variable period under the condition of flow change is also discussed.
【学位授予单位】:长沙理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:U491.54


本文编号:1555026

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jiaotonggongchenglunwen/1555026.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户2bf0e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com