Molano-Mazón Manuel, Shao Yuxiu, Duque Daniel, Yang Guangyu Robert, Ostojic Srdjan, de la Rocha Jaime
IDIBAPS, Rosselló 149, Barcelona 08036, Spain.
Laboratoire de Neurosciences Cognitives, INSERM U960, École Normale Supérieure - PSL Research University, 75005 Paris, France.
Curr Biol. 2023 Feb 27;33(4):622-638.e7. doi: 10.1016/j.cub.2022.12.044. Epub 2023 Jan 18.
The strategies found by animals facing a new task are determined both by individual experience and by structural priors evolved to leverage the statistics of natural environments. Rats quickly learn to capitalize on the trial sequence correlations of two-alternative forced choice (2AFC) tasks after correct trials but consistently deviate from optimal behavior after error trials. To understand this outcome-dependent gating, we first show that recurrent neural networks (RNNs) trained in the same 2AFC task outperform rats as they can readily learn to use across-trial information both after correct and error trials. We hypothesize that, although RNNs can optimize their behavior in the 2AFC task without any a priori restrictions, rats' strategy is constrained by a structural prior adapted to a natural environment in which rewarded and non-rewarded actions provide largely asymmetric information. When pre-training RNNs in a more ecological task with more than two possible choices, networks develop a strategy by which they gate off the across-trial evidence after errors, mimicking rats' behavior. Population analyses show that the pre-trained networks form an accurate representation of the sequence statistics independently of the outcome in the previous trial. After error trials, gating is implemented by a change in the network dynamics that temporarily decouple the categorization of the stimulus from the across-trial accumulated evidence. Our results suggest that the rats' suboptimal behavior reflects the influence of a structural prior that reacts to errors by isolating the network decision dynamics from the context, ultimately constraining the performance in a 2AFC laboratory task.
面临新任务的动物所采用的策略既由个体经验决定,也由为利用自然环境统计规律而进化出的结构先验决定。大鼠在正确试验后能迅速学会利用二选一强制选择(2AFC)任务中的试验序列相关性,但在错误试验后却始终偏离最优行为。为理解这种结果依赖的门控现象,我们首先表明,在相同2AFC任务中训练的循环神经网络(RNN)比大鼠表现更好,因为它们能在正确和错误试验后都轻松学会利用跨试验信息。我们推测,尽管RNN可以在没有任何先验限制的情况下优化其在2AFC任务中的行为,但大鼠的策略受到一种适应自然环境的结构先验的约束,在这种自然环境中,有奖励和无奖励的行动提供的信息在很大程度上是不对称的。当在具有两个以上可能选择的更生态化任务中对RNN进行预训练时,网络会形成一种策略,即它们在错误后屏蔽跨试验证据,模仿大鼠的行为。群体分析表明,预训练的网络独立于前一次试验的结果形成了序列统计的准确表征。在错误试验后,门控是通过网络动力学的变化来实现的,这种变化暂时使刺激的分类与跨试验积累的证据解耦。我们的结果表明,大鼠的次优行为反映了一种结构先验的影响,这种结构先验通过将网络决策动力学与背景隔离开来对错误做出反应,最终限制了在2AFC实验室任务中的表现。