通过渐进式情境化实现动态自适应持续强化学习

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

作者信息

Zhang Tiantian, Lin Zichuan, Wang Yuxing, Ye Deheng, Fu Qiang, Yang Wei, Wang Xueqian, Liang Bin, Yuan Bo, Li Xiu

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.

DOI:10.1109/TNNLS.2023.3280085

Abstract

A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the reinforcement learning (RL) agent's behavior as the environment changes over its lifetime while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, that is, dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process (CRP) prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of stability, overall performance, and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.

摘要

动态环境中持续强化学习（CRL）的一个关键挑战是，随着环境在其生命周期内发生变化，要及时调整强化学习（RL）智能体的行为，同时尽量减少对所学信息的灾难性遗忘。为应对这一挑战，在本文中，我们提出了DaCoRL，即动态自适应持续强化学习。DaCoRL使用渐进式情境化学习上下文条件策略，它将动态环境中一系列平稳任务逐步聚类为一系列上下文，并选择一个可扩展的多头神经网络来近似该策略。具体来说，我们将具有相似动态的一组任务定义为一个环境上下文，并将上下文推理形式化为基于环境特征的在线贝叶斯无限高斯混合聚类过程，借助在线贝叶斯推理来推断上下文的后验分布。在中餐厅过程（CRP）先验的假设下，该技术可以将当前任务准确分类为先前见过的上下文，或者根据需要实例化一个新的上下文，而无需依赖任何外部指标来提前指示环境变化。此外，我们采用一个可扩展的多头神经网络，其输出层与新实例化的上下文同步扩展，并使用一个知识蒸馏正则化项来保持在已学任务上的性能。作为一个可以与各种深度强化学习算法相结合的通用框架，通过在多个机器人导航和MuJoCo运动任务上的大量实验验证，DaCoRL在稳定性、整体性能和泛化能力方面比现有方法具有持续的优势。

相似文献

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.通过渐进式情境化实现动态自适应持续强化学习

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.

Lifelong Incremental Reinforcement Learning With Online Bayesian Inference.终身增量强化学习与在线贝叶斯推断。

IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):4003-4016. doi: 10.1109/TNNLS.2021.3055499. Epub 2022 Aug 3.

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation.强化学习中的灾难性干扰：一种基于上下文划分和知识蒸馏的解决方案。

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9925-9939. doi: 10.1109/TNNLS.2022.3162241. Epub 2023 Nov 30.

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning.用于可扩展终身强化学习的鲁棒任务模型的狄利克雷过程混合模型

IEEE Trans Cybern. 2023 Dec;53(12):7509-7520. doi: 10.1109/TCYB.2022.3170485. Epub 2023 Nov 29.

Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models.基于上下文的贝叶斯非参数模型元强化学习

IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6948-6965. doi: 10.1109/TPAMI.2024.3386780. Epub 2024 Sep 5.

Adaptive Progressive Continual Learning.自适应递进持续学习。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6715-6728. doi: 10.1109/TPAMI.2021.3095064. Epub 2022 Sep 14.

On Sequential Bayesian Inference for Continual Learning.关于持续学习的序贯贝叶斯推理

Entropy (Basel). 2023 May 31;25(6):884. doi: 10.3390/e25060884.

Continual Learning Using Bayesian Neural Networks.贝叶斯神经网络的持续学习。

IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4243-4252. doi: 10.1109/TNNLS.2020.3017292. Epub 2021 Aug 31.

Continual Reinforcement Learning for Quadruped Robot Locomotion.用于四足机器人运动的持续强化学习

Entropy (Basel). 2024 Jan 22;26(1):0. doi: 10.3390/e26010093.

Multiagent Continual Coordination via Progressive Task Contextualization.通过渐进式任务情境化实现多智能体持续协调

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.

通过渐进式情境化实现动态自适应持续强化学习

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

作者信息

Zhang Tiantian, Lin Zichuan, Wang Yuxing, Ye Deheng, Fu Qiang, Yang Wei, Wang Xueqian, Liang Bin, Yuan Bo, Li Xiu

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.

DOI:10.1109/TNNLS.2023.3280085

PMID:37285252

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过渐进式情境化实现动态自适应持续强化学习

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

作者信息

出版信息

相似文献

通过渐进式情境化实现动态自适应持续强化学习

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

作者信息

出版信息

相似文献