Center for Neural Science and Department of Psychology, New York University, New York, NY, USA.
Department of Behavioural and Cognitive Sciences, Université du Luxembourg, Esch-Belval, Luxembourg.
Nat Neurosci. 2024 Jul;27(7):1333-1339. doi: 10.1038/s41593-024-01671-x. Epub 2024 Jun 19.
We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.
我们借鉴感觉神经科学中的高效编码原理,推导出最优的神经群体来编码奖励分布。我们表明,在以下方面,小鼠和猕猴中的多巴胺能奖励预测误差神经元的反应类似于高效编码:神经元的中点分布广泛,覆盖了奖励分布;阈值较高的神经元具有更高的增益、更凸的调谐函数和更低的斜率;当奖励分布较窄时,它们的斜率更高。此外,我们推导出能够收敛到高效编码的学习规则。神经元在奖励轴上位置的学习规则与分布强化学习非常相似。因此,奖励预测误差神经元的反应可能被优化以广播高效的奖励信号,从而在高效编码和强化学习之间建立联系,这两个理论是计算神经科学中最成功的理论之一。