Suppr超能文献

离线强化学习综述:分类、回顾与开放问题

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.

作者信息

Figueiredo Prudencio Rafael, Maximo Marcos R O A, Colombini Esther Luna

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10237-10257. doi: 10.1109/TNNLS.2023.3250269. Epub 2024 Aug 5.

Abstract

With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets. Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications, such as education, healthcare, and robotics. In this work, we contribute with a unifying taxonomy to classify offline RL methods. Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field using a unified notation as well as a review of existing benchmarks' properties and shortcomings. Additionally, we provide a figure that summarizes the performance of each method and class of methods on different dataset properties, equipping researchers with the tools to decide which type of algorithm is best suited for the problem at hand and identify which classes of algorithms look the most promising. Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.

摘要

随着深度学习的广泛应用,强化学习(RL)的受欢迎程度急剧上升,能够处理以前难以解决的问题,例如从像素观测中玩复杂游戏、与人类持续对话以及控制机器人代理。然而,由于与环境交互的高成本和危险性,RL 仍有许多领域无法涉足。离线 RL 是一种仅从先前收集的交互的静态数据集中进行学习的范式,这使得从大型多样的训练数据集中提取策略变得可行。有效的离线 RL 算法比在线 RL 具有更广泛的应用范围,对于诸如教育、医疗保健和机器人技术等实际应用尤其具有吸引力。在这项工作中,我们提出了一种统一的分类法来对离线 RL 方法进行分类。此外,我们使用统一的符号对该领域的最新算法突破进行了全面综述,并对现有基准测试的属性和缺点进行了综述。此外,我们提供了一个图表,总结了每种方法和方法类别在不同数据集属性上的性能,为研究人员提供工具,以决定哪种类型的算法最适合手头的问题,并确定哪些类别的算法看起来最有前途。最后,我们阐述了对开放问题的看法,并为这个快速发展的领域提出了未来的研究方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验