通过自信学习最优决策。

Learning optimal decisions with confidence.

机构信息

Department of Neurobiology, Harvard Medical School, Boston, MA 02115;

Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal.

出版信息

Proc Natl Acad Sci U S A. 2019 Dec 3;116(49):24872-24880. doi: 10.1073/pnas.1906787116. Epub 2019 Nov 15.

DOI:10.1073/pnas.1906787116

PMID:31732671

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6900530/

Abstract

Diffusion decision models (DDMs) are immensely successful models for decision making under uncertainty and time pressure. In the context of perceptual decision making, these models typically start with two input units, organized in a neuron-antineuron pair. In contrast, in the brain, sensory inputs are encoded through the activity of large neuronal populations. Moreover, while DDMs are wired by hand, the nervous system must learn the weights of the network through trial and error. There is currently no normative theory of learning in DDMs and therefore no theory of how decision makers could learn to make optimal decisions in this context. Here, we derive such a rule for learning a near-optimal linear combination of DDM inputs based on trial-by-trial feedback. The rule is Bayesian in the sense that it learns not only the mean of the weights but also the uncertainty around this mean in the form of a covariance matrix. In this rule, the rate of learning is proportional (respectively, inversely proportional) to confidence for incorrect (respectively, correct) decisions. Furthermore, we show that, in volatile environments, the rule predicts a bias toward repeating the same choice after correct decisions, with a bias strength that is modulated by the previous choice's difficulty. Finally, we extend our learning rule to cases for which one of the choices is more likely a priori, which provides insights into how such biases modulate the mechanisms leading to optimal decisions in diffusion models.

摘要

扩散决策模型（DDM）是在不确定和时间压力下进行决策的非常成功的模型。在感知决策的背景下，这些模型通常从两个输入单元开始，这些单元以神经元-反神经元对的形式组织。相比之下，在大脑中，感觉输入是通过大量神经元群体的活动来编码的。此外，虽然 DDM 是手动布线的，但神经系统必须通过反复试验来学习网络的权重。目前，DDM 中没有规范的学习理论，因此也没有关于决策者如何在这种情况下学习做出最佳决策的理论。在这里，我们根据逐次反馈推导了一种基于学习 DDM 输入的近最优线性组合的规则。该规则在贝叶斯意义上是学习的，不仅学习权重的均值，而且还以协方差矩阵的形式学习围绕该均值的不确定性。在这个规则中，学习的速度与错误（正确）决策的置信度成正比（反比）。此外，我们表明，在不稳定的环境中，该规则预测在正确决策后会偏向于重复相同的选择，并且这种偏向的强度会受到先前选择的难度的调节。最后，我们将我们的学习规则扩展到其中一个选择更有可能先验的情况，这为这些偏见如何调节导致扩散模型中最优决策的机制提供了一些见解。

相似文献

Learning optimal decisions with confidence.通过自信学习最优决策。

Proc Natl Acad Sci U S A. 2019 Dec 3;116(49):24872-24880. doi: 10.1073/pnas.1906787116. Epub 2019 Nov 15.

Adaptive History Biases Result from Confidence-Weighted Accumulation of past Choices.适应性历史偏差源于过去选择的置信度加权积累。

J Neurosci. 2018 Mar 7;38(10):2418-2429. doi: 10.1523/JNEUROSCI.2189-17.2017. Epub 2018 Jan 25.

Assessing evidence accumulation and rule learning in humans with an online game.用在线游戏评估人类的证据积累和规则学习。

J Neurophysiol. 2023 Jan 1;129(1):131-143. doi: 10.1152/jn.00124.2022. Epub 2022 Dec 7.

The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffs.学习对知觉决策的影响及其对速度-准确性权衡的意义。

Nat Commun. 2020 Jun 2;11(1):2757. doi: 10.1038/s41467-020-16196-7.

Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling.使用扩散决策模型对强化学习过程中上下文效价和反馈信息对速度和准确性的影响进行分解：一项元分析方法。

Cogn Affect Behav Neurosci. 2019 Jun;19(3):490-502. doi: 10.3758/s13415-019-00723-1.

Reward-modulated Hebbian learning of decision making.奖励调节的决策赫布学习。

Neural Comput. 2010 Jun;22(6):1399-444. doi: 10.1162/neco.2010.03-09-980.

Observing the observer (I): meta-bayesian models of learning and decision-making.观察观察者（一）：学习和决策的元贝叶斯模型。

PLoS One. 2010 Dec 14;5(12):e15554. doi: 10.1371/journal.pone.0015554.

Optimal decision making in heterogeneous and biased environments.异构和有偏环境中的最优决策制定

Psychon Bull Rev. 2015 Feb;22(1):38-53. doi: 10.3758/s13423-014-0669-3.

A reinforcement learning diffusion decision model for value-based decisions.基于价值的决策的强化学习扩散决策模型。

Psychon Bull Rev. 2019 Aug;26(4):1099-1121. doi: 10.3758/s13423-018-1554-2.

Normative decision rules in changing environments.规范决策规则在不断变化的环境中。

Elife. 2022 Oct 25;11:e79824. doi: 10.7554/eLife.79824.

引用本文的文献

Sensory population activity reveals downstream confidence computations in the primate visual system.感觉神经元群体活动揭示了灵长类视觉系统中下行的置信度计算。

Proc Natl Acad Sci U S A. 2025 Jul;122(26):e2426441122. doi: 10.1073/pnas.2426441122. Epub 2025 Jun 25.

Computational characterization of metacognitive ability in subjective decision-making.主观决策中元认知能力的计算表征

bioRxiv. 2025 May 28:2025.05.23.655775. doi: 10.1101/2025.05.23.655775.

Confidence control for efficient behaviour in dynamic environments.在动态环境中实现高效行为的置信度控制。

Nat Commun. 2024 Oct 22;15(1):9089. doi: 10.1038/s41467-024-53312-3.

Neural basis of concurrent deliberation toward a choice and degree of confidence.对选择和信心程度进行同步思考的神经基础。

bioRxiv. 2024 Sep 27:2024.08.06.606833. doi: 10.1101/2024.08.06.606833.

A low-dimensional approximation of optimal confidence.最优置信的低维逼近。

PLoS Comput Biol. 2024 Jul 24;20(7):e1012273. doi: 10.1371/journal.pcbi.1012273. eCollection 2024 Jul.

Bayesian confidence in optimal decisions.贝叶斯置信度在最优决策中的应用。

Psychol Rev. 2024 Oct;131(5):1114-1160. doi: 10.1037/rev0000472. Epub 2024 Jul 18.

Distinct basal ganglia contributions to learning from implicit and explicit value signals in perceptual decision-making.在知觉决策中，不同的基底神经节对隐性和显性价值信号的学习有不同的贡献。

Nat Commun. 2024 Jun 22;15(1):5317. doi: 10.1038/s41467-024-49538-w.

A cognitive process model captures near-optimal confidence-guided waiting in rats.一种认知过程模型揭示了大鼠近乎最优的信心引导等待行为。

bioRxiv. 2024 Jun 20:2024.06.07.597954. doi: 10.1101/2024.06.07.597954.

Trial-history biases in evidence accumulation can give rise to apparent lapses in decision-making.在证据积累过程中，试验历史偏倚可能导致决策出现明显失误。

Nat Commun. 2024 Jan 22;15(1):662. doi: 10.1038/s41467-024-44880-5.

Expressions for Bayesian confidence of drift diffusion observers in fluctuating stimuli tasks.波动刺激任务中漂移扩散观察者的贝叶斯置信度表达式。

J Math Psychol. 2023 Dec;117:102815. doi: 10.1016/j.jmp.2023.102815.

本文引用的文献

The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffs.学习对知觉决策的影响及其对速度-准确性权衡的意义。

Nat Commun. 2020 Jun 2;11(1):2757. doi: 10.1038/s41467-020-16196-7.

Counterfactual Reasoning Underlies the Learning of Priors in Decision Making.反事实推理是决策中先验学习的基础。

Neuron. 2018 Sep 5;99(5):1083-1097.e6. doi: 10.1016/j.neuron.2018.07.035. Epub 2018 Aug 16.

Brain networks for confidence weighting and hierarchical inference during probabilistic learning.概率学习过程中用于置信加权和层次推理的脑网络。

Proc Natl Acad Sci U S A. 2017 May 9;114(19):E3859-E3868. doi: 10.1073/pnas.1615773114. Epub 2017 Apr 24.

Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias.瞳孔关联唤醒由决策不确定性驱动，并改变序列选择偏差。

Nat Commun. 2017 Mar 3;8:14637. doi: 10.1038/ncomms14637.

Optimal policy for value-based decision-making.基于价值的决策的最优策略。

Nat Commun. 2016 Aug 18;7:12400. doi: 10.1038/ncomms12400.

Confidence and certainty: distinct probabilistic quantities for different goals.置信度与确定性：针对不同目标的不同概率量值。

Nat Neurosci. 2016 Mar;19(3):366-74. doi: 10.1038/nn.4240.

Sequential effects: Superstition or rational behavior?序列效应：迷信还是理性行为？

Adv Neural Inf Process Syst. 2008;21:1873-1880.

Tuning the speed-accuracy trade-off to maximize reward rate in multisensory decision-making.在多感官决策中调整速度与准确性的权衡以最大化奖励率。

Elife. 2015 Jun 19;4:e06678. doi: 10.7554/eLife.06678.

Information-limiting correlations.信息限制相关性

Nat Neurosci. 2014 Oct;17(10):1410-7. doi: 10.1038/nn.3807. Epub 2014 Sep 7.

Optimal multisensory decision-making in a reaction-time task.反应时任务中的最优多感官决策

Elife. 2014 Jun 14;3:e03005. doi: 10.7554/eLife.03005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验