Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.
PLoS Comput Biol. 2023 Jan 30;19(1):e1010873. doi: 10.1371/journal.pcbi.1010873. eCollection 2023 Jan.
Choice impulsivity is characterized by the choice of immediate, smaller reward options over future, larger reward options, and is often thought to be associated with negative life outcomes. However, some environments make future rewards more uncertain, and in these environments impulsive choices can be beneficial. Here we examined the conditions under which impulsive vs. non-impulsive decision strategies would be advantageous. We used Markov Decision Processes (MDPs) to model three common decision-making tasks: Temporal Discounting, Information Sampling, and an Explore-Exploit task. We manipulated environmental variables to create circumstances where future outcomes were relatively uncertain. We then manipulated the discount factor of an MDP agent, which affects the value of immediate versus future rewards, to model impulsive and non-impulsive behavior. This allowed us to examine the performance of impulsive and non-impulsive agents in more or less predictable environments. In Temporal Discounting, we manipulated the transition probability to delayed rewards and found that the agent with the lower discount factor (i.e. the impulsive agent) collected more average reward than the agent with a higher discount factor (the non-impulsive agent) by selecting immediate reward options when the probability of receiving the future reward was low. In the Information Sampling task, we manipulated the amount of information obtained with each sample. When sampling led to small information gains, the impulsive MDP agent collected more average reward than the non-impulsive agent. Third, in the Explore-Exploit task, we manipulated the substitution rate for novel options. When the substitution rate was high, the impulsive agent again performed better than the non-impulsive agent, as it explored the novel options less and instead exploited options with known reward values. The results of these analyses show that impulsivity can be advantageous in environments that are unexpectedly uncertain.
选择冲动性的特点是选择即时的、较小的奖励选项而不是未来的、较大的奖励选项,通常被认为与负面的生活结果有关。然而,一些环境会使未来的奖励变得更加不确定,在这些环境中,冲动的选择可能是有益的。在这里,我们研究了冲动与非冲动决策策略在哪些情况下会更有利。我们使用马尔可夫决策过程(MDPs)来模拟三种常见的决策任务:时间折扣、信息采样和探索-利用任务。我们操纵环境变量来创造未来结果相对不确定的情况。然后,我们操纵 MDP 代理的折扣因子,该因子影响即时奖励与未来奖励的价值,以模拟冲动和非冲动行为。这使我们能够在更可预测或更不可预测的环境中检查冲动和非冲动代理的性能。在时间折扣中,我们操纵延迟奖励的转移概率,发现折扣因子较低的代理(即冲动代理)比折扣因子较高的代理(非冲动代理)通过选择即时奖励选项而收集更多的平均奖励,当未来奖励的概率较低时。在信息采样任务中,我们操纵每次采样获得的信息量。当采样导致信息量较小时,冲动 MDP 代理比非冲动代理收集更多的平均奖励。第三,在探索-利用任务中,我们操纵新选项的替代率。当替代率较高时,冲动代理再次比非冲动代理表现更好,因为它对新选项的探索较少,而是利用具有已知奖励值的选项。这些分析的结果表明,在出乎意料的不确定环境中,冲动性可能是有利的。