时间不变协变量尽管强化延迟非常长，但仍能产生单次强化学习。

Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement.

机构信息

Department of Psychology & Rutgers Center for Cognitive Sciences, Rutgers The State University of New Jersey, Piscataway, NJ 08854-8020.

Department of Psychology, Utah State University, Logan, UT 84322-2810.

出版信息

Proc Natl Acad Sci U S A. 2024 Jul 23;121(30):e2405451121. doi: 10.1073/pnas.2405451121. Epub 2024 Jul 15.

DOI:10.1073/pnas.2405451121

PMID:39008663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11287270/

Abstract

Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement-the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action-reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.

摘要

强化学习在神经科学、认知科学、机器学习和人工智能领域激发了大量的理论研究。一个核心问题是产生对行动和强化之间的关系（归因问题）的感知的条件。联想和强化学习的当代模型并没有利用时间度量（测量间隔）。我们的信息论方法通过时间尺度不变的时间互信息形式化了这种关系。它预测，即使在非常长的动作-强化器延迟下，学习也可能迅速进行。我们表明，老鼠甚至可以在一个动作和强化之间有 16 分钟的延迟后（比之前支持这种学习的任何延迟都长 15 倍）学习一个动作。通过利用度量时间信息，我们的解决方案避免了联想窗口、指数衰减的资格痕迹、微刺激或贝叶斯信念状态分布的需要。它的三个方程没有自由参数；它们预测无需迭代模拟即可进行单次学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfcd/11287270/0a9a71d741af/pnas.2405451121fig01.jpg

相似文献

Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement.时间不变协变量尽管强化延迟非常长，但仍能产生单次强化学习。

Proc Natl Acad Sci U S A. 2024 Jul 23;121(30):e2405451121. doi: 10.1073/pnas.2405451121. Epub 2024 Jul 15.

Contingency, contiguity, and causality in conditioning: Applying information theory and Weber's Law to the assignment of credit problem.条件作用中的偶然性、连续性和因果关系：将信息论和韦伯定律应用于信用分配问题。

Psychol Rev. 2019 Oct;126(5):761-773. doi: 10.1037/rev0000163. Epub 2019 Aug 29.

Hippocampal lesions facilitate instrumental learning with delayed reinforcement but induce impulsive choice in rats.海马体损伤有助于大鼠进行延迟强化的工具性学习，但会导致其做出冲动选择。

BMC Neurosci. 2005 May 13;6:36. doi: 10.1186/1471-2202-6-36.

One-shot learning and behavioral eligibility traces in sequential decision making.序列决策中的单次学习和行为资格痕迹。

Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.

Spatio-temporal credit assignment in neuronal population learning.神经元群体学习中的时空信用分配。

PLoS Comput Biol. 2011 Jun;7(6):e1002092. doi: 10.1371/journal.pcbi.1002092. Epub 2011 Jun 30.

Navigating complex decision spaces: Problems and paradigms in sequential choice.导航复杂决策空间：序列选择中的问题和范式。

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

A reinforcement learning approach to instrumental contingency degradation in rats.一种关于大鼠工具性条件作用退化的强化学习方法。

J Physiol Paris. 2011 Jan-Jun;105(1-3):36-44. doi: 10.1016/j.jphysparis.2011.07.017. Epub 2011 Aug 31.

Delayed reinforcement of operant behavior.操作性条件反射行为的延缓强化。

J Exp Anal Behav. 2010 Jan;93(1):129-39. doi: 10.1901/jeab.2010.93-129.

Increased generalization in a peak procedure after delayed reinforcement.延迟强化后峰值程序中的泛化增加。

Behav Processes. 2019 Dec;169:103978. doi: 10.1016/j.beproc.2019.103978. Epub 2019 Sep 30.

Temporal integration and instrumental conditioned reinforcement.时间整合与工具性条件强化。

Learn Behav. 2014 Sep;42(3):201-8. doi: 10.3758/s13420-014-0138-x.

引用本文的文献

Reconceptualized Associative Learning.重新概念化的联想学习

Perspect Behav Sci. 2025 Apr 2;48(2):203-239. doi: 10.1007/s40614-025-00442-8. eCollection 2025 Jun.

Learning temporal relationships between symbols with Laplace Neural Manifolds.利用拉普拉斯神经流形学习符号之间的时间关系。

ArXiv. 2024 Sep 22:arXiv:2302.10163v4.

本文引用的文献

Theory of reinforcement schedules.强化时间表理论。

J Exp Anal Behav. 2023 Nov;120(3):289-319. doi: 10.1002/jeab.880. Epub 2023 Sep 14.

Temporal encoding: Relative and absolute representations of time guide behavior.时间编码：时间的相对和绝对表示引导行为。

J Exp Psychol Anim Learn Cogn. 2023 Jan;49(1):46-61. doi: 10.1037/xan0000345.

The timing database: An open-access, live repository for interval timing studies.时间数据库：一个开放获取的、实时的间隔时间研究存储库。

Behav Res Methods. 2024 Jan;56(1):290-300. doi: 10.3758/s13428-022-02050-9. Epub 2023 Jan 3.

Mesolimbic dopamine release conveys causal associations.中脑边缘多巴胺释放传递因果关系。

Science. 2022 Dec 23;378(6626):eabq6740. doi: 10.1126/science.abq6740.

Time in Associative Learning: A Review on Temporal Maps.联想学习中的时间：关于时间图谱的综述

Front Hum Neurosci. 2021 Apr 6;15:617943. doi: 10.3389/fnhum.2021.617943. eCollection 2021.

Psychol Rev. 2019 Oct;126(5):761-773. doi: 10.1037/rev0000163. Epub 2019 Aug 29.

In a Temporally Segmented Experience Hippocampal Neurons Represent Temporally Drifting Context But Not Discrete Segments.在时间分段的体验中，海马体神经元代表时间漂移的上下文，但不代表离散的片段。

J Neurosci. 2019 Aug 28;39(35):6936-6952. doi: 10.1523/JNEUROSCI.1420-18.2019. Epub 2019 Jun 28.

Scalar timing in memory: A temporal map in the hippocampus.记忆中的标量计时：海马体中的时间图谱。

J Theor Biol. 2018 Feb 7;438:133-142. doi: 10.1016/j.jtbi.2017.11.012. Epub 2017 Nov 16.

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.人类和动物中的强化学习与情景记忆：一个综合框架

Annu Rev Psychol. 2017 Jan 3;68:101-128. doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.

Neuronal Reward and Decision Signals: From Theories to Data.神经元奖励与决策信号：从理论到数据

Physiol Rev. 2015 Jul;95(3):853-951. doi: 10.1152/physrev.00023.2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

时间不变协变量尽管强化延迟非常长，但仍能产生单次强化学习。

Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献