Suppr超能文献

用于自主车道保持中强化学习的基于实时响应时间(RRT)引导的经验生成

RRT-guided experience generation for reinforcement learning in autonomous lane keeping.

作者信息

Bécsi Tamás

机构信息

Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics, Budapest, 1111, Hungary.

出版信息

Sci Rep. 2024 Oct 14;14(1):24059. doi: 10.1038/s41598-024-73881-z.

Abstract

Reinforcement Learning has emerged as a significant component of Machine Learning in the domain of highly automated driving, facilitating various tasks ranging from high-level navigation to control tasks such as trajectory tracking and lane keeping. However, the agent's action choice during training is often constrained by a balance between exploitation and exploration, which can impede effective learning, especially in environments with sparse rewards. To address this challenge, researchers have explored combining RL with sampling-based exploration methods such as Rapidly-exploring Random Trees to aid in exploration. This paper investigates the effectiveness of classic exploration strategies in RL algorithms, particularly focusing on their ability to cover the state space and provide a quality experience pool for learning agents. The study centers on the lane-keeping problem of a dynamic vehicle model handled by RL, examining a scenario where reward shaping is omitted, leading to sparse rewards. The paper demonstrates how classic exploration techniques often cover only a small portion of the state space, hindering learning. By leveraging RRT to broaden the experience pool, the agent can learn a better policy, as exemplified by the dynamic vehicle model's lane-following problem.

摘要

强化学习已成为高度自动化驾驶领域机器学习的重要组成部分,有助于执行从高级导航到轨迹跟踪和车道保持等控制任务在内的各种任务。然而,智能体在训练期间的动作选择通常受到利用和探索之间平衡的限制,这可能会阻碍有效学习,尤其是在奖励稀疏的环境中。为应对这一挑战,研究人员探索了将强化学习与基于采样的探索方法(如快速探索随机树)相结合,以辅助探索。本文研究了经典探索策略在强化学习算法中的有效性,特别关注它们覆盖状态空间的能力以及为学习智能体提供高质量经验池的能力。该研究聚焦于由强化学习处理的动态车辆模型的车道保持问题,考察了一个省略奖励塑造导致奖励稀疏的场景。本文证明了经典探索技术通常只能覆盖状态空间的一小部分,从而阻碍学习。通过利用快速探索随机树来拓宽经验池,智能体可以学习到更好的策略,动态车辆模型的车道跟踪问题就是例证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/406a/11473803/2368dbbf7f7a/41598_2024_73881_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验