Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
PLoS Comput Biol. 2023 Sep 14;19(9):e1011430. doi: 10.1371/journal.pcbi.1011430. eCollection 2023 Sep.
In reversal learning tasks, the behavior of humans and animals is often assumed to be uniform within single experimental sessions to facilitate data analysis and model fitting. However, behavior of agents can display substantial variability in single experimental sessions, as they execute different blocks of trials with different transition dynamics. Here, we observed that in a deterministic reversal learning task, mice display noisy and sub-optimal choice transitions even at the expert stages of learning. We investigated two sources of the sub-optimality in the behavior. First, we found that mice exhibit a high lapse rate during task execution, as they reverted to unrewarded directions after choice transitions. Second, we unexpectedly found that a majority of mice did not execute a uniform strategy, but rather mixed between several behavioral modes with different transition dynamics. We quantified the use of such mixtures with a state-space model, block Hidden Markov Model (block HMM), to dissociate the mixtures of dynamic choice transitions in individual blocks of trials. Additionally, we found that blockHMM transition modes in rodent behavior can be accounted for by two different types of behavioral algorithms, model-free or inference-based learning, that might be used to solve the task. Combining these approaches, we found that mice used a mixture of both exploratory, model-free strategies and deterministic, inference-based behavior in the task, explaining their overall noisy choice sequences. Together, our combined computational approach highlights intrinsic sources of noise in rodent reversal learning behavior and provides a richer description of behavior than conventional techniques, while uncovering the hidden states that underlie the block-by-block transitions.
在反转学习任务中,为了便于数据分析和模型拟合,通常假设人类和动物的行为在单个实验会话内是一致的。然而,由于代理在执行具有不同转换动态的不同试验块时,可以显示出相当大的行为变异性,因此其行为会显示出很大的可变性。在这里,我们观察到,在确定性反转学习任务中,即使在学习的专家阶段,老鼠的选择转换也会显示出嘈杂和次优的行为。我们研究了行为次优性的两个来源。首先,我们发现老鼠在任务执行过程中表现出很高的失误率,因为它们在选择转换后又回到未受奖励的方向。其次,我们出人意料地发现,大多数老鼠并没有执行统一的策略,而是在几种具有不同转换动态的行为模式之间混合。我们使用状态空间模型(block Hidden Markov Model,block HMM)来量化这种混合,以区分个体试验块中的混合动态选择转换。此外,我们发现啮齿动物行为中的 blockHMM 转换模式可以由两种不同类型的行为算法来解释,即无模型或基于推理的学习,这两种算法可能用于解决任务。结合这些方法,我们发现老鼠在任务中同时使用了探索性的无模型策略和确定性的基于推理的行为的混合,这解释了它们整体嘈杂的选择序列。总的来说,我们的组合计算方法突出了啮齿动物反转学习行为中的内在噪声源,并提供了比传统技术更丰富的行为描述,同时揭示了块到块转换背后的隐藏状态。