Apps Matthew A J, Lesage Elise, Ramnani Narender
Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX1 9DU, United Kingdom, Department of Experimental Psychology, University of Oxford, Oxford OX1 2JD, United Kingdom, Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and
Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and Neuroimaging Research Branch, Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, Maryland 21224.
J Neurosci. 2015 Feb 18;35(7):2904-13. doi: 10.1523/JNEUROSCI.3669-14.2015.
Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors.
强化学习(RL)理论认为,学习是由行动的预测结果与实际结果之间的差异(预测误差[PEs])驱动的。在社会环境中,学习通常由类似的强化学习机制引导。例如,教师会监控学生的行为并给予他们反馈。这种反馈会在学生中引发预测误差,从而引导他们的学习。我们报告了第一项研究,该研究调查了教师大脑中强化学习信号背后的神经机制。前扣带回皮质(ACC)中的神经元在从自身行动结果中学习时会发出预测误差信号,但在他人接收结果时也会发出信息信号。当教师监控学生的学习时,其ACC会发出预测误差信号吗?我们使用功能磁共振成像(fMRI)研究了人类受试者(教师)在通过提供正面或负面反馈来教授一名同伙(学生)行动-结果关联时的大脑活动。我们检查了与学生反应时间锁定的活动,此时教师推断学生的预测并知道实际结果。我们将基于强化学习的计算模型应用于学生的行为,以表征他们的学习情况,并检查当学生的预测错误时教师的ACC是否发出信号。与我们的假设一致,教师ACC中的活动与模型中的预测误差值相关。此外,教师脑岛和腹内侧前额叶皮质的活动与根据学生情况预测的值相关。我们的研究结果表明,在监控和指导他人学习时,ACC会替代他人的错误预测发出预测误差信号。这些结果表明,通过替代方式处理的强化学习机制可能是教学行为的基础并促进教学行为。