当前位置:主页 > 科技论文 > 交通工程论文 >

风险敏感的强化学习交通信号配时研究

发布时间:2018-08-17 19:50
【摘要】:目前,城市的交通问题已成为了制约城市经济发展的重要影响因素。如何解决交通拥堵,保障交通系统顺畅、有序地运行已成为政府工作中的重中之重。而局限于城市空间的有限、经济与环境的需求性,交通基础设施的扩建已不可能了。这时,发展智能交通来解决交通拥堵已成为了唯一的途径。总结国内外对智能交通系统的研究基础上,本文针对风险敏感理论和Q学习理论应用于交通信号控制优化进行了研究。主要研究工作包括:1.基于风险避免Q学习的在线信号配时优化模型研究首先分析了现有的信号配时强化学习模型大多是风险中立的强化学习模型,其缺点是在线学习中稳定性和鲁棒性较差,需要的运行时间较长且收敛效果不明显。为了解决存在的这些问题,本文建立了风险避免强化学习交通信号配时模型,用排队长度差作为模型的交通评价指标。在集成VISSIM-Excel VBA-Matlab的仿真平台上进行了仿真实验,分析了风险程度系数对配时方案优劣程度、收敛性的影响;与风险中立的强化学习模型进行对比分析,得出新模型在稳定性有较大改进,收敛速度较快,在交通评价指标上运行效果大体上相当的结果。针对交通信号配时优化这类问题,应采用增量风险避免强化学习方法,即风险程度系数应采用小步距递增的方式。2.基于风险寻求Q学习的在线信号配时优化模型研究考虑到交通状况的随机性、不确定性,不可能按照交通规划者的期望出现。所以有时必须充分考虑各种可能出现的情况,就算可能存在较高的风险。本文进一步构建了基于风险寻求Q学习的在线信号配时优化模型。用排队长度差作为模型的交通评价指标,为了更好地与基于风险避免Q学习的在线信号配时优化模型形成对比,各种模型建立条件都一致。再在集成VISSIM-Excel VBA-Matlab的仿真平台上进行了仿真实验,分析了风险程度系数对配时方案优劣程度、收敛性的影响;与风险中立的强化学习模型进行对比分析,得出该模型收敛速度较快。与风险避免的Q学习模型配时方案相比探索范围更广了,训练的行为个数更多了,但是配时方案的性能效果有好有坏,模型不稳定。
[Abstract]:At present, urban traffic problem has become an important factor restricting the development of urban economy. How to solve traffic jams, ensure the smooth and orderly operation of traffic system has become the most important task of the government. Limited by the limited urban space, economic and environmental needs, the expansion of transport infrastructure has become impossible. At this time, the development of intelligent traffic to solve traffic congestion has become the only way. Based on the research of intelligent transportation system at home and abroad, this paper studies the application of risk sensitive theory and Q learning theory to traffic signal control optimization. The main research work includes: 1. Research on online signal timing Optimization Model based on risk avoidance Q Learning; first of all, the existing signal timing reinforcement learning models are mostly risk-neutral reinforcement learning models, but their disadvantages are poor stability and robustness in online learning. The running time needed is longer and the convergence effect is not obvious. In order to solve these problems, a traffic signal timing model for risk avoidance reinforcement learning is established in this paper. The queue length difference is used as the traffic evaluation index of the model. The simulation experiments are carried out on the integrated VISSIM-Excel VBA-Matlab platform, and the influence of the risk degree coefficient on the quality and convergence of the timing scheme is analyzed, which is compared with the risk-neutral reinforcement learning model. It is concluded that the stability of the new model is greatly improved, the convergence rate is faster, and the operation effect on the traffic evaluation index is basically equivalent. To solve the problem of traffic signal timing optimization, incremental risk avoidance reinforcement learning method should be adopted, that is, the risk degree coefficient should be increased by small step by step. Research on online signal timing Optimization Model based on risk seeking Q Learning; considering the randomness and uncertainty of traffic conditions, it is impossible to meet the expectations of traffic planners. Therefore, it is sometimes necessary to take into account all possible situations, even if there may be a higher risk. In this paper, an online signal timing optimization model based on risk seeking Q learning is further constructed. In order to better compare with the online signal timing optimization model based on risk avoidance Q-learning, the conditions for the establishment of the model are consistent with the queue length difference as the traffic evaluation index of the model. Then the simulation experiments are carried out on the integrated VISSIM-Excel VBA-Matlab platform, and the influence of the risk degree coefficient on the quality and convergence of the timing scheme is analyzed, and compared with the risk-neutral reinforcement learning model, the convergence speed of the model is found to be faster. Compared with the risk-avoiding Q learning model timing scheme, the exploration scope is wider and the number of training behaviors is more, but the performance of the timing scheme is good and bad, and the model is unstable.
【学位授予单位】:长沙理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:U491.54

【参考文献】

相关期刊论文 前2条

1 郭祥康;深圳市交通控制系统发展概况[J];警察技术;2000年04期

2 高阳,陈世福,陆鑫;强化学习研究综述[J];自动化学报;2004年01期



本文编号:2188720

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jiaotonggongchenglunwen/2188720.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户24135***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com