Paunov Alexander, L'Hôtellier Maëva, Guo Dalin, He Zoe, Yu Angela, Meyniel Florent
INSERM-CEA Cognitive Neuroimaging Unit (UNICOG), NeuroSpin Center, CEA Paris-Saclay, Gif-sur-Yvette, France Université de Paris, Paris, France.
Institut de Neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris Cité, Paris, France.
bioRxiv. 2024 Sep 12:2024.03.27.587016. doi: 10.1101/2024.03.27.587016.
Decision-making in noisy, changing, and partially observable environments entails a basic tradeoff between immediate reward and longer-term information gain, known as the exploration-exploitation dilemma. Computationally, an effective way to balance this tradeoff is by leveraging uncertainty to guide exploration. Yet, in humans, empirical findings are mixed, from suggesting uncertainty-seeking to indifference and avoidance. In a novel bandit task that better captures uncertainty-driven behavior, we find multiple roles for uncertainty in human choices. First, stable and psychologically meaningful individual differences in uncertainty preferences actually range from seeking to avoidance, which can manifest as null group-level effects. Second, uncertainty modulates the use of basic decision heuristics that imperfectly exploit immediate rewards: a repetition bias and win-stay-lose-shift heuristic. These heuristics interact with uncertainty, favoring heuristic choices under higher uncertainty. These results, highlighting the rich and varied structure of reward-based choice, are a step to understanding its functional basis and dysfunction in psychopathology.
在嘈杂、多变且部分可观察的环境中进行决策,需要在即时奖励和长期信息获取之间进行基本权衡,这就是所谓的探索-利用困境。从计算角度来看,平衡这种权衡的一种有效方法是利用不确定性来指导探索。然而,在人类中,实证研究结果不一,从表明寻求不确定性到无差异和回避。在一项能更好地捕捉不确定性驱动行为的新型强盗任务中,我们发现不确定性在人类选择中扮演多种角色。首先,在不确定性偏好方面,稳定且具有心理意义的个体差异实际上涵盖从寻求到回避的范围,这可能表现为群体层面的零效应。其次,不确定性调节了对基本决策启发式方法的使用,这些方法不能完美地利用即时奖励:重复偏差和赢留输变启发式。这些启发式方法与不确定性相互作用,在更高的不确定性下更倾向于启发式选择。这些结果突出了基于奖励的选择的丰富多样结构,是理解其在精神病理学中的功能基础和功能障碍的重要一步。