Graduate Program in Neuroscience, Harvard University, Cambridge, Massachusetts 02138.
Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138.
J Neurosci. 2020 May 27;40(22):4335-4347. doi: 10.1523/JNEUROSCI.2604-19.2020. Epub 2020 Apr 22.
Rodents can successfully learn multiple novel stimulus-response associations after only a few repetitions when the contingencies predict reward. The circuits modified during such reinforcement learning to support decision-making are not known, but the olfactory tubercle (OT) and posterior piriform cortex (pPC) are candidates for decoding reward category from olfactory sensory input and relaying this information to cognitive and motor areas. Through single-cell recordings in behaving male and female C57BL/6 mice, we show here that an explicit representation for reward category emerges in the OT within minutes of learning a novel odor-reward association, whereas the pPC lacks an explicit representation even after weeks of overtraining. The explicit reward category representation in OT is visible in the first sniff (50-100 ms) of an odor on each trial, and precedes the motor action. Together, these results suggest that the coding of stimulus information required for reward prediction does not occur within olfactory cortex, but rather in circuits involving the olfactory striatum. Rodents are olfactory specialists and can use odors to learn contingencies quickly and well. We have found that mice can readily learn to place multiple odors into rewarded and unrewarded categories. Once they have learned the rule, they can do such categorization in a matter of minutes (<10 trials). We found that neural activity in olfactory cortex largely reflects sensory coding, with very little explicit information about categories. By contrast, neural activity in a brain region in the ventral striatum is rapidly modified in a matter of minutes to reflect reward category. Our experiments set up a paradigm for studying rapid sensorimotor reinforcement in a circuit that is right at the interface of sensory input and reward areas.
当条件能够预测奖励时,啮齿动物只需重复几次就能成功学习多个新的刺激-反应关联。在这种强化学习过程中,用于支持决策的修改电路尚不清楚,但嗅觉结节(OT)和后梨状皮质(pPC)是从嗅觉感觉输入中解码奖励类别并将此信息中继到认知和运动区域的候选者。通过对雄性和雌性 C57BL/6 小鼠进行行为记录,我们在这里表明,在学习新的气味-奖励关联后的几分钟内,OT 中就出现了明确的奖励类别表示,而 pPC 即使经过数周的过度训练也缺乏明确的表示。OT 中的明确奖励类别表示在每个试验中第一次嗅探(50-100ms)气味时可见,并且先于运动动作。这些结果表明,用于奖励预测的刺激信息编码不是在嗅觉皮层中发生,而是在涉及嗅觉纹状体的电路中发生。啮齿动物是嗅觉专家,可以快速而很好地利用气味来学习关联。我们发现,老鼠可以轻松地学习将多种气味归入奖励和非奖励类别。一旦它们掌握了规则,它们就可以在几分钟内(<10 次试验)进行此类分类。我们发现,嗅觉皮层中的神经活动在很大程度上反映了感觉编码,而关于类别的明确信息很少。相比之下,腹侧纹状体中的大脑区域的神经活动在几分钟内迅速改变,以反映奖励类别。我们的实验建立了一个研究快速感觉运动强化的范例,该范例位于感觉输入和奖励区域的交界处。