Department of Neurobiology, Harvard Medical School, Boston, MA 02115;
Champalimaud Research, Champalimaud Centre for the Unknown, 1400-038 Lisbon, Portugal.
Proc Natl Acad Sci U S A. 2019 Dec 3;116(49):24872-24880. doi: 10.1073/pnas.1906787116. Epub 2019 Nov 15.
Diffusion decision models (DDMs) are immensely successful models for decision making under uncertainty and time pressure. In the context of perceptual decision making, these models typically start with two input units, organized in a neuron-antineuron pair. In contrast, in the brain, sensory inputs are encoded through the activity of large neuronal populations. Moreover, while DDMs are wired by hand, the nervous system must learn the weights of the network through trial and error. There is currently no normative theory of learning in DDMs and therefore no theory of how decision makers could learn to make optimal decisions in this context. Here, we derive such a rule for learning a near-optimal linear combination of DDM inputs based on trial-by-trial feedback. The rule is Bayesian in the sense that it learns not only the mean of the weights but also the uncertainty around this mean in the form of a covariance matrix. In this rule, the rate of learning is proportional (respectively, inversely proportional) to confidence for incorrect (respectively, correct) decisions. Furthermore, we show that, in volatile environments, the rule predicts a bias toward repeating the same choice after correct decisions, with a bias strength that is modulated by the previous choice's difficulty. Finally, we extend our learning rule to cases for which one of the choices is more likely a priori, which provides insights into how such biases modulate the mechanisms leading to optimal decisions in diffusion models.
扩散决策模型(DDM)是在不确定和时间压力下进行决策的非常成功的模型。在感知决策的背景下,这些模型通常从两个输入单元开始,这些单元以神经元-反神经元对的形式组织。相比之下,在大脑中,感觉输入是通过大量神经元群体的活动来编码的。此外,虽然 DDM 是手动布线的,但神经系统必须通过反复试验来学习网络的权重。目前,DDM 中没有规范的学习理论,因此也没有关于决策者如何在这种情况下学习做出最佳决策的理论。在这里,我们根据逐次反馈推导了一种基于学习 DDM 输入的近最优线性组合的规则。该规则在贝叶斯意义上是学习的,不仅学习权重的均值,而且还以协方差矩阵的形式学习围绕该均值的不确定性。在这个规则中,学习的速度与错误(正确)决策的置信度成正比(反比)。此外,我们表明,在不稳定的环境中,该规则预测在正确决策后会偏向于重复相同的选择,并且这种偏向的强度会受到先前选择的难度的调节。最后,我们将我们的学习规则扩展到其中一个选择更有可能先验的情况,这为这些偏见如何调节导致扩散模型中最优决策的机制提供了一些见解。