绩效导向的审议：一种适应情境的策略，其中紧迫性是机会成本。

Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost.

机构信息

Mila, Québec AI Institute, Montréal, Canada.

Department of Computer Science & Operations Research, Université de Montréal, Montréal, Canada.

出版信息

PLoS Comput Biol. 2022 May 26;18(5):e1010080. doi: 10.1371/journal.pcbi.1010080. eCollection 2022 May.

DOI:10.1371/journal.pcbi.1010080

PMID:35617370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9176815/

Abstract

Finding the right amount of deliberation, between insufficient and excessive, is a hard decision making problem that depends on the value we place on our time. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the opportunity cost of time, including deliberation time. Importantly, this cost can itself vary with the environmental context and is not trivial to estimate. Here, we propose how the opportunity cost of deliberation can be estimated adaptively on multiple timescales to account for non-stationary contextual factors. We use it in a simple decision-making heuristic based on average-reward reinforcement learning (AR-RL) that we call Performance-Gated Deliberation (PGD). We propose PGD as a strategy used by animals wherein deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We show PGD outperforms AR-RL solutions in explaining behaviour and urgency of non-human primates in a context-varying random walk prediction task and is consistent with relative performance and urgency in a context-varying random dot motion task. We make readily testable predictions for both neural activity and behaviour.

摘要

在不足和过度之间找到适当的思考量是一个艰难的决策问题，这取决于我们对时间的重视程度。在现有的强化学习理论中，平均奖励（tonic dopamine 编码）被认为是时间的机会成本，包括思考时间。重要的是，这种成本本身可以随着环境背景而变化，并且很难估计。在这里，我们提出了如何在多个时间尺度上自适应地估计思考的机会成本，以解释非平稳的上下文因素。我们将其用于一种基于平均奖励强化学习（AR-RL）的简单决策启发式方法，称为基于表现的审议（PGD）。我们提出 PGD 是动物使用的一种策略，其中审议成本直接作为紧迫性实施，这是一种以前被描述过的神经信号，有效地控制决策过程的速度。我们表明，在上下文变化的随机游走预测任务中，PGD 比 AR-RL 解决方案更好地解释了非人类灵长类动物的行为和紧迫性，并且与上下文变化的随机点运动任务中的相对表现和紧迫性一致。我们对神经活动和行为都提出了易于测试的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1425/9176815/568d5b22701f/pcbi.1010080.g001.jpg

相似文献

Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost.

PLoS Comput Biol. 2022 May 26;18(5):e1010080. doi: 10.1371/journal.pcbi.1010080. eCollection 2022 May.

Normative decision rules in changing environments.

Elife. 2022 Oct 25;11:e79824. doi: 10.7554/eLife.79824.

Dopamine Manipulation Affects Response Vigor Independently of Opportunity Cost.

J Neurosci. 2016 Sep 14;36(37):9516-25. doi: 10.1523/JNEUROSCI.4467-15.2016.

Multiple memory systems as substrates for multiple decision systems.

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task.

Neural Netw. 2021 Feb;134:1-10. doi: 10.1016/j.neunet.2020.11.003. Epub 2020 Nov 18.

Testing models of context-dependent outcome encoding in reinforcement learning.

Cognition. 2023 Jan;230:105280. doi: 10.1016/j.cognition.2022.105280. Epub 2022 Sep 12.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making.

J Neurosci. 2007 Nov 21;27(47):12860-7. doi: 10.1523/JNEUROSCI.2496-07.2007.

Mice learn to avoid regret.

PLoS Biol. 2018 Jun 21;16(6):e2005853. doi: 10.1371/journal.pbio.2005853. eCollection 2018 Jun.

Dopaminergic circuitry and risk/reward decision making: implications for schizophrenia.

Schizophr Bull. 2015 Jan;41(1):9-14. doi: 10.1093/schbul/sbu165. Epub 2014 Nov 17.

本文引用的文献

Predictive Representations in Hippocampal and Prefrontal Hierarchies.

J Neurosci. 2022 Jan 12;42(2):299-312. doi: 10.1523/JNEUROSCI.1327-21.2021. Epub 2021 Nov 19.

Context-sensitive valuation and learning.

Curr Opin Behav Sci. 2021 Oct;41:122-127. doi: 10.1016/j.cobeha.2021.05.001. Epub 2021 Jun 9.

The case against economic values in the orbitofrontal cortex (or anywhere else in the brain).

Behav Neurosci. 2021 Apr;135(2):192-201. doi: 10.1037/bne0000448.

A Unified Framework for Dopamine Signals across Timescales.

Cell. 2020 Dec 10;183(6):1600-1616.e25. doi: 10.1016/j.cell.2020.11.013. Epub 2020 Nov 27.

Prediction errors bidirectionally bias time perception.

Nat Neurosci. 2020 Oct;23(10):1198-1202. doi: 10.1038/s41593-020-0698-3. Epub 2020 Aug 24.

It's all relative: Reward-induced cognitive control modulation depends on context.

J Exp Psychol Gen. 2021 Feb;150(2):306-313. doi: 10.1037/xge0000842. Epub 2020 Aug 13.

Dopamine Modulates Dynamic Decision-Making during Foraging.

J Neurosci. 2020 Jul 1;40(27):5273-5282. doi: 10.1523/JNEUROSCI.2586-19.2020. Epub 2020 May 26.

Beyond the Average View of Dopamine.

Trends Cogn Sci. 2020 Jul;24(7):499-501. doi: 10.1016/j.tics.2020.04.006. Epub 2020 May 15.

Computational limits don't fully explain human cognitive limitations.

Behav Brain Sci. 2020 Mar 11;43:e7. doi: 10.1017/S0140525X19001651.

Abundance Compensates Kinetics: Similar Effect of Dopamine Signals on D1 and D2 Receptor Populations.

J Neurosci. 2020 Apr 1;40(14):2868-2881. doi: 10.1523/JNEUROSCI.1951-19.2019. Epub 2020 Feb 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

绩效导向的审议：一种适应情境的策略，其中紧迫性是机会成本。

Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献