Suppr超能文献

时间不变协变量尽管强化延迟非常长,但仍能产生单次强化学习。

Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement.

机构信息

Department of Psychology & Rutgers Center for Cognitive Sciences, Rutgers The State University of New Jersey, Piscataway, NJ 08854-8020.

Department of Psychology, Utah State University, Logan, UT 84322-2810.

出版信息

Proc Natl Acad Sci U S A. 2024 Jul 23;121(30):e2405451121. doi: 10.1073/pnas.2405451121. Epub 2024 Jul 15.

Abstract

Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement-the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action-reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.

摘要

强化学习在神经科学、认知科学、机器学习和人工智能领域激发了大量的理论研究。一个核心问题是产生对行动和强化之间的关系(归因问题)的感知的条件。联想和强化学习的当代模型并没有利用时间度量(测量间隔)。我们的信息论方法通过时间尺度不变的时间互信息形式化了这种关系。它预测,即使在非常长的动作-强化器延迟下,学习也可能迅速进行。我们表明,老鼠甚至可以在一个动作和强化之间有 16 分钟的延迟后(比之前支持这种学习的任何延迟都长 15 倍)学习一个动作。通过利用度量时间信息,我们的解决方案避免了联想窗口、指数衰减的资格痕迹、微刺激或贝叶斯信念状态分布的需要。它的三个方程没有自由参数;它们预测无需迭代模拟即可进行单次学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfcd/11287270/0a9a71d741af/pnas.2405451121fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验