Department of Psychology & Rutgers Center for Cognitive Sciences, Rutgers The State University of New Jersey, Piscataway, NJ 08854-8020.
Department of Psychology, Utah State University, Logan, UT 84322-2810.
Proc Natl Acad Sci U S A. 2024 Jul 23;121(30):e2405451121. doi: 10.1073/pnas.2405451121. Epub 2024 Jul 15.
Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement-the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action-reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.
强化学习在神经科学、认知科学、机器学习和人工智能领域激发了大量的理论研究。一个核心问题是产生对行动和强化之间的关系(归因问题)的感知的条件。联想和强化学习的当代模型并没有利用时间度量(测量间隔)。我们的信息论方法通过时间尺度不变的时间互信息形式化了这种关系。它预测,即使在非常长的动作-强化器延迟下,学习也可能迅速进行。我们表明,老鼠甚至可以在一个动作和强化之间有 16 分钟的延迟后(比之前支持这种学习的任何延迟都长 15 倍)学习一个动作。通过利用度量时间信息,我们的解决方案避免了联想窗口、指数衰减的资格痕迹、微刺激或贝叶斯信念状态分布的需要。它的三个方程没有自由参数;它们预测无需迭代模拟即可进行单次学习。