基于机器学习的3D仿真足球机器人动作与协作优化

发布时间：2019-06-04 09:34

【摘要】：本文在RoboCup3D仿真平台中,实现建立了高通量计算机集群HTCondor系统,并在此基础上建立了Robocup3D个体机器人的动作优化机制,对机器人的踢球及行走参数进行了训练优化,其次基于足球队伍的阵型及角色分配两个问题研究了动态环境中多智能体的有效协作对抗策略。个体机器人的动作优化问题中,由于单机优化速度较慢,采用高吞吐量计算机集群系统分配网络资源,缩短优化时间;其次利用CMA-ES算法,对5类机器人的踢球动作进行了优化,使用该算法的增强学习训练框架,成功地优化了机器人的远射及快踢动作。针对机器人行走优化对单一训练任务过拟合的问题,设计了多个子任务和多个子参数集的分层学习方法,全面提升了5类机器人的行走、转弯和带球的机动性和稳定性。多智能体的协作对抗策略中,分别针对足球队伍的阵型优化和多智能体的角色分配优化问题进行了研究。首先基于Delaunay三角网对足球场进行了剖分,并采用基于形势的阵型机制(SBSP)对足球队伍的阵型进行了设计,有效实现了足球在关键位置的足球队伍整体阵型的多样化;在阵型确定的基础上,利用Markov决策过程(MDP)模型,对机器人队伍的角色分配进行了优化,综合考虑仿真环境中5种不同类型的机器人的距离、朝向、是否跌倒、速度等影响因素,利用线性函数近似的Sarsa(?)学习算法对MDP模型中的动作值函数进行求解,寻找最优的角色分配方案,提高了团队的整体进攻防守效率。经过多次实验证明,本文的研究对于Apollo3D队伍的个体机器人以及整体队伍的阵型分配及角色轮换方面有着较大的提升。
[Abstract]:In this paper, a high-throughput computer cluster HTCondor system is established in the RoboCup 3D simulation platform, and the motion optimization mechanism of the RoboCup 3D individual robot is established, and the training and optimization of the robot's kicking and walking parameters are carried out. Secondly, the effective cooperative countermeasures of multi-agent in the dynamic environment are studied based on the formation and role distribution of the football team. in the problem of the operation optimization of the individual robot, because the single-machine optimization speed is slow, the network resource is distributed by adopting a high-throughput computer cluster system, the optimization time is shortened, and the ball-kicking action of the 5-class robot is optimized by using the CMA-ES algorithm, Using the enhanced learning training framework of this algorithm, the robot's long shot and kick action were successfully optimized. Aiming at the problem of overfitting a single training task by the robot walking optimization, a hierarchical learning method of a plurality of sub-tasks and a plurality of sub-parameter sets is designed, and the mobility and the stability of the walking, turning and ball-carrying of the 5-class robot are comprehensively improved. In the cooperative countermeasure of multi-agent, the problem of the optimization of the array and the role distribution of the multi-agent is studied. First, on the basis of Delaunay triangulation, the football field is divided, and the array type of the football team is designed by using the situation-based array mechanism (SBSP), which effectively realizes the diversification of the overall formation of the football team in the key position; on the basis of the formation determination, By using the Markov decision-making process (MDP) model, the role distribution of the robot team is optimized, and the influence factors such as the distance, the orientation, the fall, the speed and the like of the five different types of robots in the simulation environment are considered. ) The learning algorithm is used to solve the action value function in the MDP model, find the optimal role assignment scheme, and improve the overall attack and defense efficiency of the team. It has been proved by many experiments that the research of this paper has a great effect on the individual robot of the Apollo 3D team and the array distribution and the character rotation of the whole team.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP242

【参考文献】