Hall-McMaster Sam, Tomov Momchil S, Gershman Samuel J, Schuck Nicolas W
Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America.
Max Planck Institute for Human Development, Berlin, Germany.
PLoS Biol. 2025 Jun 5;23(6):e3003174. doi: 10.1371/journal.pbio.3003174. eCollection 2025 Jun.
Generalization from past experience is an important feature of intelligent systems. When faced with a new task, one efficient computational approach is to evaluate solutions to earlier tasks as candidates for reuse. Consistent with this idea, we found that human participants (n = 38) learned optimal solutions to a set of training tasks and generalized them to novel test tasks in a reward-selective manner. This behavior was consistent with a computational process based on the successor representation known as successor features and generalized policy improvement (SF&GPI). Neither model-free perseveration or model-based control using a complete model of the environment could explain choice behavior. Decoding from functional magnetic resonance imaging data revealed that solutions from the SF&GPI algorithm were activated on test tasks in visual and prefrontal cortex. This activation had a functional connection to behavior in that stronger activation of SF&GPI solutions in visual areas was associated with increased behavioral reuse. These findings point to a possible neural implementation of an adaptive algorithm for generalization across tasks.
从过去的经验中进行归纳是智能系统的一个重要特征。当面对一项新任务时,一种有效的计算方法是将早期任务的解决方案作为可复用的候选方案进行评估。与这一观点一致,我们发现人类参与者(n = 38)学习了一组训练任务的最优解决方案,并以奖励选择性的方式将其推广到新的测试任务中。这种行为与基于后继表示的计算过程一致,该过程被称为后继特征和广义策略改进(SF&GPI)。无模型的固执或使用环境完整模型的基于模型的控制都无法解释选择行为。从功能磁共振成像数据进行解码显示,SF&GPI算法的解决方案在测试任务中于视觉和前额叶皮层被激活。这种激活与行为存在功能联系,即视觉区域中SF&GPI解决方案的更强激活与行为复用的增加相关。这些发现指出了一种跨任务归纳的自适应算法可能的神经实现方式。