当前位置:主页 > 科技论文 > 汽车论文 >

基于安全强化学习的车道保持方法研究及其在SUMO中的验证

发布时间:2021-11-03 12:00
  自动驾驶在不久的将来将会改变人们日常生活中的交通方式,大量的工作已投入到自主驾驶中的决策和运动控制算法。目前,强化学习(Reinforcement Learning)一直是应用于这方面的主要策略。但是,若将强化学习应用于自动驾驶,其在进行探索时所采取的行动可能造成安全隐患,而且该算法的收敛速度可能太慢。因此要想将强化学习走出实验室并应用于实际的车辆自主学习中的话,迫切需要解决强化学习中的安全问题。论文提出了一种应用于自动驾驶的安全强化学习算法(Safe Reinforcement Learning),通过添加约束来确保算法学习过程中的安全性。论文提出带约束的策略优化算法(CPO:Constrained Policy Optimization),该算法的关键在于在代价函数中引入条件约束。CPO算法基于Actor-Critic算法框架,通过设置硬约束条件降低策略更新的大小来确保策略更新过程中的安全性。论文主要工作内容包括CPO算法的理论证明和推导,实际应用以及仿真结果分析。论文在多种地图上比较了提出的算法,评估和分析了算法在不同地图上的安全性和稳定性。同时,论文也比较了CPO算法和传统强化... 

【文章来源】:清华大学北京市 211工程院校 985工程院校 教育部直属院校

【文章页数】:70 页

【学位级别】:硕士

【文章目录】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
    1.1 GENERAL INTRODUCTION AND BACKGROUND
    1.2 PROBLEM STATEMENT
    1.3 OBJECTIVE
    1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
    2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
    2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
        2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
        2.2.2 BELLMAN EQUATION
    2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
    2.4 REINFORCEMENT LEARNING ALGORITHMS
        2.4.1 DYNAMIC PROGRAMMING
        2.4.2 Q-LEARNING
        2.4.3 SARSA ALGORITHM
        2.4.4 POLICY GRADIENT METHODS
        2.4.5 ACTOR-CRITIC
    2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
        2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
        2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
    3.1 CPO ALGORITHM
        3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
        3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
        3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
    3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
        3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
        3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
    4.1 EXPERIMENT DESIGN
    4.2 MAP DESIGN AND ANALYSIS
        4.2.1 STRAIGHT ROAD
        4.2.2 S-SHAPED CURVED ROAD
        4.2.3 LOOP
        4.2.4 ROUNDABOUT
    4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
    5.1 SUMO(SIMULATION OF URBAN MOBILITY)
    5.2 INTRODUCTION TO TRACI
    5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
    5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
    6.1 SUMMARY AND CONTRIBUTIONS
    6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME



本文编号:3473643

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/qiche/3473643.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户13e00***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com