人类行为中后继表征的逐次试验学习。

Trial-by-trial learning of successor representations in human behavior.

作者信息

Kahn Ari E, Bassett Dani S, Daw Nathaniel D

机构信息

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

bioRxiv. 2025 Jun 16:2024.11.07.622528. doi: 10.1101/2024.11.07.622528.

Abstract

Decisions in humans and other organisms depend, in part, on learning and using models that capture the statistical structure of the world, including the long-run expected outcomes of our actions. One prominent approach to forecasting such long-run outcomes is the successor representation (SR), which predicts future states aggregated over multiple timesteps. Although much behavioral and neural evidence suggests that people and animals use such a representation, it remains unknown how they acquire it. It has frequently been assumed to be learned by temporal difference bootstrapping (SR-TD(0)), but this assumption has largely not been empirically tested or compared to alternatives including eligibility traces (SR-TD( ). Here we address this gap by leveraging trial-by-trial reaction times in graph sequence learning tasks, which are favorable for studying learning dynamics because the long horizons in these studies differentiate the transient update dynamics of different learning rules. We examined the behavior of SR-TD on a probabilistic graph learning task alongside a number of alternatives, and found that behavior was best explained by a hybrid model which learned via SR-TD alongside an additional predictive model of recency. The relatively large we estimate indicates a predominant role of eligibility trace mechanisms over the bootstrap-based chaining typically assumed. Our results provide insight into how humans learn predictive representations, and demonstrate that people simultaneously learn the SR alongside lower-order predictions.

摘要

人类和其他生物体的决策部分取决于学习和使用能够捕捉世界统计结构的模型,包括我们行动的长期预期结果。预测此类长期结果的一种突出方法是后继表示(SR),它预测多个时间步长上聚合的未来状态。尽管大量行为和神经证据表明人和动物使用这种表示,但他们如何获得它仍然未知。人们经常假设它是通过时间差分自展(SR-TD(0))学习的,但这一假设在很大程度上尚未经过实证检验,也未与包括资格迹线(SR-TD( ))在内的其他方法进行比较。在这里,我们通过利用图序列学习任务中的逐次试验反应时间来填补这一空白,这些任务有利于研究学习动态,因为这些研究中的长时程区分了不同学习规则的瞬态更新动态。我们在概率图学习任务中研究了SR-TD( )与其他一些方法的行为,发现行为最好由一个混合模型解释,该模型通过SR-TD( )以及一个额外的近期预测模型进行学习。我们估计的相对较大的( )表明资格迹线机制比通常假设的基于自展的链式机制起主要作用。我们的结果为人类如何学习预测性表示提供了见解,并表明人们在学习低阶预测的同时也学习了SR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/112d/12262301/e3ea92faefad/nihpp-2024.11.07.622528v3-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索