Palminteri Stefano, Kilford Emma J, Coricelli Giorgio, Blakemore Sarah-Jayne
Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
Laboratoire de Neurosciences Cognitive, École Normale Supérieure, Paris, France.
PLoS Comput Biol. 2016 Jun 20;12(6):e1004953. doi: 10.1371/journal.pcbi.1004953. eCollection 2016 Jun.
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.
青春期是一个以学习和决策变化为特征的生命阶段。学习和决策并非依赖单一系统,而是需要不同认知过程的协调,这些认知过程在数学上可形式化为可分离的计算模块。在此,我们旨在追踪负责从奖励或惩罚中学习以及从反事实反馈中学习的计算模块的发展时间进程。青少年和成年人进行了一种新颖的强化学习范式,参与者在其中学习线索与概率结果之间的关联,其中结果在效价上有所不同(奖励与惩罚),并且反馈要么是部分的,要么是完整的(要么仅显示所选选项的结果,要么显示所选和未选选项的结果)。计算策略在发展过程中发生了变化:虽然青少年的行为通过基本强化学习算法能得到更好的解释,但成年人的行为整合了越来越复杂的计算特征,即一个反事实学习模块(在存在完整反馈时能提高表现)和一个价值情境化模块(实现对称的奖励和惩罚学习)。与成年人不同,青少年的表现并未从反事实(完整)反馈中受益。此外,虽然成年人从奖励和惩罚中进行对称学习,但青少年从奖励中学习,而从惩罚中学习的可能性较小。这种依赖奖励而不考虑行动的其他后果的倾向可能有助于我们理解青春期的决策。