Cecchini Gloria, DePass Michael, Baspinar Emre, Andujar Marta, Ramawat Surabhi, Pani Pierpaolo, Ferraina Stefano, Destexhe Alain, Moreno-Bote Rubén, Cos Ignasi
Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain.
Center for Brain and Cognition, DTIC, Universitat Pompeu Fabra, Barcelona, Spain.
Front Behav Neurosci. 2024 Aug 12;18:1399394. doi: 10.3389/fnbeh.2024.1399394. eCollection 2024.
Learning to make adaptive decisions involves making choices, assessing their consequence, and leveraging this assessment to attain higher rewarding states. Despite vast literature on value-based decision-making, relatively little is known about the cognitive processes underlying decisions in highly uncertain contexts. Real world decisions are rarely accompanied by immediate feedback, explicit rewards, or complete knowledge of the environment. Being able to make informed decisions in such contexts requires significant knowledge about the environment, which can only be gained via exploration. Here we aim at understanding and formalizing the brain mechanisms underlying these processes. To this end, we first designed and performed an experimental task. Human participants had to learn to maximize reward while making sequences of decisions with only basic knowledge of the environment, and in the absence of explicit performance cues. Participants had to rely on their own internal assessment of performance to reveal a covert relationship between their choices and their subsequent consequences to find a strategy leading to the highest cumulative reward. Our results show that the participants' reaction times were longer whenever the decision involved a future consequence, suggesting greater introspection whenever a delayed value had to be considered. The learning time varied significantly across participants. Second, we formalized the neurocognitive processes underlying decision-making within this task, combining mean-field representations of competing neural populations with a reinforcement learning mechanism. This model provided a plausible characterization of the brain dynamics underlying these processes, and reproduced each aspect of the participants' behavior, from their reaction times and choices to their learning rates. In summary, both the experimental results and the model provide a principled explanation to how delayed value may be computed and incorporated into the neural dynamics of decision-making, and to how learning occurs in these uncertain scenarios.
学习做出适应性决策包括做出选择、评估其后果,并利用这种评估来达到更高的奖励状态。尽管关于基于价值的决策有大量文献,但对于高度不确定环境中决策背后的认知过程却知之甚少。现实世界中的决策很少伴随着即时反馈、明确的奖励或对环境的全面了解。要在这种情况下做出明智的决策,需要对环境有大量的了解,而这只能通过探索来获得。在这里,我们旨在理解并形式化这些过程背后的大脑机制。为此,我们首先设计并执行了一项实验任务。人类参与者必须在仅对环境有基本了解且没有明确表现线索的情况下,通过做出一系列决策来学习最大化奖励。参与者必须依靠自己对表现的内部评估,以揭示他们的选择与其后续后果之间的隐蔽关系,从而找到一种能带来最高累积奖励的策略。我们的结果表明,每当决策涉及未来后果时,参与者的反应时间就会更长,这表明在必须考虑延迟价值时会进行更多的内省。参与者的学习时间差异很大。其次,我们将相互竞争的神经群体的平均场表示与强化学习机制相结合,形式化了此任务中决策背后的神经认知过程。该模型对这些过程背后的大脑动力学提供了合理的描述,并重现了参与者行为的各个方面,从他们的反应时间、选择到学习率。总之,实验结果和模型都为如何计算延迟价值并将其纳入决策的神经动力学,以及如何在这些不确定的场景中进行学习提供了一个有原则的解释。