量化强化学习智能体的自主性、对记忆的依赖以及环境内化程度。

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment.

作者信息

Ingel Anti, Makkeh Abdullah, Corcoll Oriol, Vicente Raul

机构信息

Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia.

Göttingen Campus Institute for Dynamics of Biological Networks, University of Göttingen, 37075 Göttingen, Germany.

出版信息

Entropy (Basel). 2022 Mar 13;24(3):401. doi: 10.3390/e24030401.

DOI:10.3390/e24030401

PMID:35327912

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8947692/

Abstract

Intuitively, the level of autonomy of an agent is related to the degree to which the agent's goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent's behaviour against perturbations in the observations.

摘要

直观地说，智能体的自主性水平与智能体的目标和行为与环境的直接控制解耦的程度有关。在此，我们利用最近关于自主性的信息论公式，引入一种算法，用于在时间步长趋近于无穷大的极限过程中计算自主性。我们解决了智能体的自主性水平在训练过程中如何变化的问题。特别是，在这项工作中，我们使用部分信息分解（PID）框架来监测强化学习（RL）智能体的自主性水平和环境内化程度。我们在两种环境中进行了实验：一种是网格世界，智能体必须在其中收集食物；另一种是重复模式环境，智能体必须通过记忆序列来学习模仿一系列动作。PID还使我们能够回答智能体在转换到下一个内部状态时，在多大程度上依赖其内部记忆（相对于它在多大程度上依赖观察结果）。实验表明，PID的特定项与获得的奖励以及智能体针对观察结果中的扰动的行为密切相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66a0/8947692/b7145a711d72/entropy-24-00401-g0A1.jpg

相似文献

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment.

Entropy (Basel). 2022 Mar 13;24(3):401. doi: 10.3390/e24030401.

A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.

Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.

Continuous action deep reinforcement learning for propofol dosing during general anesthesia.

Artif Intell Med. 2022 Jan;123:102227. doi: 10.1016/j.artmed.2021.102227. Epub 2021 Dec 2.

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning.

IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2712-2724. doi: 10.1109/TPAMI.2020.3037898. Epub 2022 Apr 1.

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning.

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3461-3475. doi: 10.1109/TPAMI.2022.3190471. Epub 2023 Feb 3.

Reinforcement Learning for Improving Agent Design.

Artif Life. 2019 Fall;25(4):352-365. doi: 10.1162/artl_a_00301. Epub 2019 Nov 7.

A dynamic approach to support outbreak management using reinforcement learning and semi-connected SEIQR models.

BMC Public Health. 2024 Mar 11;24(1):751. doi: 10.1186/s12889-024-18251-0.

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1243-1252. doi: 10.1109/TNNLS.2021.3105140. Epub 2023 Feb 28.

Towards a Broad-Persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments.

Sensors (Basel). 2023 Mar 1;23(5):2681. doi: 10.3390/s23052681.

Experimental quantum speed-up in reinforcement learning agents.

Nature. 2021 Mar;591(7849):229-233. doi: 10.1038/s41586-021-03242-7. Epub 2021 Mar 10.

引用本文的文献

A general framework for interpretable neural learning based on local information-theoretic goal functions.

Proc Natl Acad Sci U S A. 2025 Mar 11;122(10):e2408125122. doi: 10.1073/pnas.2408125122. Epub 2025 Mar 5.

本文引用的文献

Bits and pieces: understanding information decomposition from part-whole relationships and formal logic.

Proc Math Phys Eng Sci. 2021 Jul;477(2251):20210110. doi: 10.1098/rspa.2021.0110. Epub 2021 Jul 7.

Introducing a differentiable measure of pointwise shared information.

Phys Rev E. 2021 Mar;103(3-1):032149. doi: 10.1103/PhysRevE.103.032149.

A Path-Based Partial Information Decomposition.

Entropy (Basel). 2020 Aug 29;22(9):952. doi: 10.3390/e22090952.

Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices.

Entropy (Basel). 2018 Apr 18;20(4):297. doi: 10.3390/e20040297.

BROJA-2PID: A Robust Estimator for Bivariate Partial Information Decomposition.

Entropy (Basel). 2018 Apr 11;20(4):271. doi: 10.3390/e20040271.

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.

Partial information decomposition as a unified approach to the specification of neural goal functions.

Brain Cogn. 2017 Mar;112:25-38. doi: 10.1016/j.bandc.2015.09.004. Epub 2015 Oct 21.

Bivariate measure of redundant information.

Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012130. doi: 10.1103/PhysRevE.87.012130. Epub 2013 Jan 23.

Measuring autonomy and emergence via Granger causality.

Artif Life. 2010 Spring;16(2):179-96. doi: 10.1162/artl.2010.16.2.16204.

Autonomy: an information theoretic perspective.

Biosystems. 2008 Feb;91(2):331-45. doi: 10.1016/j.biosystems.2007.05.018. Epub 2007 Aug 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

量化强化学习智能体的自主性、对记忆的依赖以及环境内化程度。

Quantifying Reinforcement-Learning Agent's Autonomy, Reliance on Memory and Internalisation of the Environment.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献