• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于可扩展终身强化学习的鲁棒任务模型的狄利克雷过程混合模型

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning.

作者信息

Wang Zhi, Chen Chunlin, Dong Daoyi

出版信息

IEEE Trans Cybern. 2023 Dec;53(12):7509-7520. doi: 10.1109/TCYB.2022.3170485. Epub 2023 Nov 29.

DOI:10.1109/TCYB.2022.3170485
PMID:35580095
Abstract

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In this article, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the nonstationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian nonparametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture; thus, the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.

摘要

虽然强化学习(RL)算法在各种具有挑战性的任务中取得了领先的性能,但当面对终身流信息时,它们很容易遇到灾难性遗忘或干扰。在本文中,我们提出了一种可扩展的终身RL方法,该方法动态扩展网络容量以容纳新知识,同时防止过去的记忆受到干扰。我们使用狄利克雷过程混合来对非平稳任务分布进行建模,通过估计任务到聚类分配的可能性来捕获任务相关性,并在潜在空间中对任务模型进行聚类。我们将混合的先验分布公式化为中餐厅过程(CRP),该过程根据需要实例化新的混合组件。混合的更新和扩展由具有期望最大化(EM)过程的贝叶斯非参数框架控制,该框架动态调整模型复杂度,无需明确的任务边界或启发式方法。此外,我们使用域随机化技术来训练健壮的先验参数,用于混合中每个任务模型的初始化;因此,得到的模型可以更好地泛化并适应未见任务。通过在机器人导航和运动领域进行的大量实验,我们表明我们的方法成功地促进了可扩展的终身RL,并且优于相关的现有方法。

相似文献

1
A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning.用于可扩展终身强化学习的鲁棒任务模型的狄利克雷过程混合模型
IEEE Trans Cybern. 2023 Dec;53(12):7509-7520. doi: 10.1109/TCYB.2022.3170485. Epub 2023 Nov 29.
2
Lifelong Incremental Reinforcement Learning With Online Bayesian Inference.终身增量强化学习与在线贝叶斯推断。
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):4003-4016. doi: 10.1109/TNNLS.2021.3055499. Epub 2022 Aug 3.
3
Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.通过渐进式情境化实现动态自适应持续强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14588-14602. doi: 10.1109/TNNLS.2023.3280085. Epub 2024 Oct 7.
4
Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models.基于上下文的贝叶斯非参数模型元强化学习
IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6948-6965. doi: 10.1109/TPAMI.2024.3386780. Epub 2024 Sep 5.
5
Lifelong Mixture of Variational Autoencoders.变分自编码器的终身混合模型。
IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):461-474. doi: 10.1109/TNNLS.2021.3096457. Epub 2023 Jan 5.
6
CL3: Generalization of Contrastive Loss for Lifelong Learning.CL3:用于终身学习的对比损失的泛化
J Imaging. 2023 Nov 23;9(12):259. doi: 10.3390/jimaging9120259.
7
Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments.非平稳和非参数环境中的元强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13604-13618. doi: 10.1109/TNNLS.2023.3270298. Epub 2024 Oct 7.
8
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning.对抗性特征对齐:避免增量任务终身学习中的灾难性遗忘
Neural Comput. 2019 Nov;31(11):2266-2291. doi: 10.1162/neco_a_01232. Epub 2019 Sep 16.
9
Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning.层次聚类优化了任务结构的组合性和表现力之间的权衡,以实现灵活的强化学习。
Artif Intell. 2022 Nov;312. doi: 10.1016/j.artint.2022.103770. Epub 2022 Aug 5.
10
VLAD: Task-agnostic VAE-based lifelong anomaly detection.VLAD:基于任务无关 VAE 的终身异常检测。
Neural Netw. 2023 Aug;165:248-273. doi: 10.1016/j.neunet.2023.05.032. Epub 2023 May 27.