Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218, USA; Johns Hopkins Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD 21218, USA; The Solomon Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218, USA; Johns Hopkins Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD 21218, USA; The Solomon Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Curr Biol. 2024 May 20;34(10):2107-2117.e5. doi: 10.1016/j.cub.2024.04.017. Epub 2024 Apr 26.
Humans, even as infants, use cognitive strategies, such as exploration and hypothesis testing, to learn about causal interactions in the environment. In animal learning studies, however, it is challenging to disentangle higher-order behavioral strategies from errors arising from imperfect task knowledge or inherent biases. Here, we trained head-fixed mice on a wheel-based auditory two-choice task and exploited the intra- and inter-animal variability to understand the drivers of errors during learning. During learning, performance errors are dominated by a choice bias, which, despite appearing maladaptive, reflects a dynamic strategy. Early in learning, mice develop an internal model of the task contingencies such that violating their expectation of reward on correct trials (by using short blocks of non-rewarded "probe" trials) leads to an abrupt shift in strategy. During the probe block, mice behave more accurately with less bias, thereby using their learned stimulus-action knowledge to test whether the outcome contingencies have changed. Despite having this knowledge, mice continued to exhibit a strong choice bias during reinforced trials. This choice bias operates on a timescale of tens to hundreds of trials with a dynamic structure, shifting between left, right, and unbiased epochs. Biased epochs also coincided with faster motor kinematics. Although bias decreased across learning, expert mice continued to exhibit short bouts of biased choices interspersed with longer bouts of unbiased choices and higher performance. These findings collectively suggest that during learning, rodents actively probe their environment in a structured manner to refine their decision-making and maintain long-term flexibility.
人类,即使是婴儿,也会使用认知策略,如探索和假设检验,来了解环境中的因果相互作用。然而,在动物学习研究中,很难将高阶行为策略与由于任务知识不完美或固有偏见而产生的错误区分开来。在这里,我们在基于车轮的听觉二选一任务中训练了固定头部的老鼠,并利用个体内和个体间的可变性来理解学习过程中的错误驱动因素。在学习过程中,表现错误主要由选择偏差主导,尽管这种选择偏差看起来适应不良,但它反映了一种动态策略。在学习的早期,老鼠会形成对任务关联的内部模型,从而导致策略的突然转变:在正确试验中违反其奖励预期(通过使用短时间的无奖励“探测”试验)。在探测块期间,老鼠的行为准确性更高,偏差更小,从而利用其学习到的刺激-反应知识来测试结果关联是否发生了变化。尽管有了这种知识,老鼠在强化试验中仍然表现出强烈的选择偏差。这种选择偏差在数十到数百次试验的时间尺度上运作,具有动态结构,在左右和无偏置的时期之间切换。有偏差的时期也与更快的运动运动学相吻合。尽管随着学习的进行,偏差会减小,但专家老鼠仍然会表现出短暂的偏差选择时期,穿插着更长的无偏差选择时期和更高的性能。这些发现共同表明,在学习过程中,啮齿动物会主动以结构化的方式探测环境,以完善决策并保持长期灵活性。