Zhang Zhewei, Costa Kauê M, Langdon Angela J, Schoenbaum Geoffrey
National Institute on Drug Abuse Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA.
Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
Trends Cogn Sci. 2025 May;29(5):434-447. doi: 10.1016/j.tics.2025.02.001. Epub 2025 Feb 26.
Over recent decades, temporal difference reinforcement learning (TDRL) models have successfully explained much dopamine (DA) activity. This success has invited heightened scrutiny of late, with many studies challenging the validity of TDRL models of DA function. Yet, when evaluating the validity of these models, the devil is truly in the details. TDRL is a broad class of algorithms sharing core ideas but differing greatly in implementation and predictions. Thus, it is important to identify the defining aspects of the TDRL framework being tested and to use state spaces and model architectures that capture the known complexity of the behavioral representations and neural systems involved. Here, we discuss several examples that illustrate the importance of these considerations.
在最近几十年里,时间差分强化学习(TDRL)模型成功地解释了许多多巴胺(DA)活动。这种成功近来引发了更严格的审查,许多研究对DA功能的TDRL模型的有效性提出了质疑。然而,在评估这些模型的有效性时,真正的问题在于细节。TDRL是一类广泛的算法,它们共享核心思想,但在实现和预测方面有很大差异。因此,识别正在测试的TDRL框架的定义方面,并使用能够捕捉所涉及行为表征和神经系统已知复杂性的状态空间和模型架构非常重要。在这里,我们讨论几个例子来说明这些考虑因素的重要性。