• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过渐进式情境化实现动态自适应持续强化学习

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

作者信息

Zhang Tiantian, Lin Zichuan, Wang Yuxing, Ye Deheng, Fu Qiang, Yang Wei, Wang Xueqian, Liang Bin, Yuan Bo, Li Xiu

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.

DOI:10.1109/TNNLS.2023.3280085
PMID:37285252
Abstract

A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the reinforcement learning (RL) agent's behavior as the environment changes over its lifetime while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, that is, dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process (CRP) prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of stability, overall performance, and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.

摘要

动态环境中持续强化学习(CRL)的一个关键挑战是,随着环境在其生命周期内发生变化,要及时调整强化学习(RL)智能体的行为,同时尽量减少对所学信息的灾难性遗忘。为应对这一挑战,在本文中,我们提出了DaCoRL,即动态自适应持续强化学习。DaCoRL使用渐进式情境化学习上下文条件策略,它将动态环境中一系列平稳任务逐步聚类为一系列上下文,并选择一个可扩展的多头神经网络来近似该策略。具体来说,我们将具有相似动态的一组任务定义为一个环境上下文,并将上下文推理形式化为基于环境特征的在线贝叶斯无限高斯混合聚类过程,借助在线贝叶斯推理来推断上下文的后验分布。在中餐厅过程(CRP)先验的假设下,该技术可以将当前任务准确分类为先前见过的上下文,或者根据需要实例化一个新的上下文,而无需依赖任何外部指标来提前指示环境变化。此外,我们采用一个可扩展的多头神经网络,其输出层与新实例化的上下文同步扩展,并使用一个知识蒸馏正则化项来保持在已学任务上的性能。作为一个可以与各种深度强化学习算法相结合的通用框架,通过在多个机器人导航和MuJoCo运动任务上的大量实验验证,DaCoRL在稳定性、整体性能和泛化能力方面比现有方法具有持续的优势。

相似文献

1
Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.通过渐进式情境化实现动态自适应持续强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.
2
Lifelong Incremental Reinforcement Learning With Online Bayesian Inference.终身增量强化学习与在线贝叶斯推断。
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):4003-4016. doi: 10.1109/TNNLS.2021.3055499. Epub 2022 Aug 3.
3
Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation.强化学习中的灾难性干扰:一种基于上下文划分和知识蒸馏的解决方案。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9925-9939. doi: 10.1109/TNNLS.2022.3162241. Epub 2023 Nov 30.
4
A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning.用于可扩展终身强化学习的鲁棒任务模型的狄利克雷过程混合模型
IEEE Trans Cybern. 2023 Dec;53(12):7509-7520. doi: 10.1109/TCYB.2022.3170485. Epub 2023 Nov 29.
5
Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models.基于上下文的贝叶斯非参数模型元强化学习
IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6948-6965. doi: 10.1109/TPAMI.2024.3386780. Epub 2024 Sep 5.
6
Adaptive Progressive Continual Learning.自适应递进持续学习。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6715-6728. doi: 10.1109/TPAMI.2021.3095064. Epub 2022 Sep 14.
7
On Sequential Bayesian Inference for Continual Learning.关于持续学习的序贯贝叶斯推理
Entropy (Basel). 2023 May 31;25(6):884. doi: 10.3390/e25060884.
8
Continual Learning Using Bayesian Neural Networks.贝叶斯神经网络的持续学习。
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4243-4252. doi: 10.1109/TNNLS.2020.3017292. Epub 2021 Aug 31.
9
Continual Reinforcement Learning for Quadruped Robot Locomotion.用于四足机器人运动的持续强化学习
Entropy (Basel). 2024 Jan 22;26(1):0. doi: 10.3390/e26010093.
10
Multiagent Continual Coordination via Progressive Task Contextualization.通过渐进式任务情境化实现多智能体持续协调
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.