有限强化下的强化学习：在部分可观测马尔可夫决策过程中使用贝叶斯风险进行主动学习

Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs.

作者信息

Doshi Finale, Pineau Joelle, Roy Nicholas

机构信息

Massachusetts Institute of Technology, Boston, USA,

出版信息

Proc Int Conf Mach Learn. 2008;301:256-263. doi: 10.1901/jaba.2008.301-256.

DOI:10.1901/jaba.2008.301-256

PMID:20467572

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2868199/

Abstract

Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent's knowledge and actions that increase an agent's reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a "model-uncertainty" POMDP. Coupled with model-directed queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems.

摘要

部分可观测马尔可夫决策过程（POMDP）已成功应用于需要平衡增加智能体知识的动作和增加智能体奖励的动作的规划领域。不幸的是，大多数POMDP是用大量参数定义的，仅从领域知识很难指定这些参数。在本文中，我们提出一种近似方法，使我们能够将POMDP模型参数视为“模型不确定性”POMDP中的额外隐藏状态。结合模型导向查询，我们的规划器能够积极学习良好策略。我们在几个POMDP问题上展示了我们的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3676/2868199/fb72449896b8/nihms99743f1.jpg

相似文献

Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs.有限强化下的强化学习：在部分可观测马尔可夫决策过程中使用贝叶斯风险进行主动学习

Proc Int Conf Mach Learn. 2008;301:256-263. doi: 10.1901/jaba.2008.301-256.

Modeling and Planning with Macro-Actions in Decentralized POMDPs.分散式部分可观察马尔可夫决策过程中宏动作的建模与规划

J Artif Intell Res. 2019;64:817-859. doi: 10.1613/jair.1.11418. Epub 2019 Mar 25.

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.使用逆强化学习生成奖励函数以实现个性化癌症筛查。

Artif Intell Health (2018). 2019;11326:213-227. doi: 10.1007/978-3-030-12738-1_16. Epub 2019 Feb 21.

Learning State-Variable Relationships in POMCP: A Framework for Mobile Robots.在粒子滤波蒙特卡洛规划中学习状态变量关系：一种移动机器人框架

Front Robot AI. 2022 Jul 19;9:819107. doi: 10.3389/frobt.2022.819107. eCollection 2022.

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.主动推理与强化学习：部分可观测性下连续状态与动作空间的统一推理

Neural Comput. 2024 Sep 17;36(10):2073-2135. doi: 10.1162/neco_a_01698.

Online Planning Algorithms for POMDPs.部分可观测马尔可夫决策过程的在线规划算法

J Artif Intell Res. 2008 Jul 1;32(2):663-704.

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.具有调制赫布型加Q网络架构的深度强化学习

IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2045-2056. doi: 10.1109/TNNLS.2021.3110281. Epub 2022 May 2.

Addressing structural and observational uncertainty in resource management.应对资源管理中的结构和观测不确定性。

J Environ Manage. 2014 Jan 15;133:27-36. doi: 10.1016/j.jenvman.2013.11.004. Epub 2013 Dec 20.

Partial observability and management of ecological systems.生态系统的部分可观测性与管理

Ecol Evol. 2022 Sep 13;12(9):e9197. doi: 10.1002/ece3.9197. eCollection 2022 Sep.

Task-based decomposition of factored POMDPs.基于任务的因子化 POMDP 分解。

IEEE Trans Cybern. 2014 Feb;44(2):208-16. doi: 10.1109/TCYB.2013.2252009.

引用本文的文献

An Active Inference Approach to Dissecting Reasons for Nonadherence to Antidepressants.主动推理方法剖析抗抑郁药治疗不依从的原因。

Biol Psychiatry Cogn Neurosci Neuroimaging. 2021 Sep;6(9):919-934. doi: 10.1016/j.bpsc.2019.11.012. Epub 2019 Dec 5.

Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.内隐价值更新解释传递性推理表现：贝塔排序模型。

PLoS Comput Biol. 2015 Sep 25;11(9):e1004523. doi: 10.1371/journal.pcbi.1004523. eCollection 2015.

Emotion and decision-making: affect-driven belief systems in anxiety and depression.情绪与决策：焦虑和抑郁中的情感驱动信念系统。

Trends Cogn Sci. 2012 Sep;16(9):476-83. doi: 10.1016/j.tics.2012.07.009. Epub 2012 Aug 13.

Informing sequential clinical decision-making through reinforcement learning: an empirical study.通过强化学习为序贯临床决策提供信息：一项实证研究。

Mach Learn. 2011 Jul 1;84(1-2):109-136. doi: 10.1007/s10994-010-5229-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验