• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

联合建模反应时间和选择可提高强化学习模型的参数可识别性。

Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models.

机构信息

Neurosciences Graduate Training Program, Stanford University, Stanford, CA 94305, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA; Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.

Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.

出版信息

J Neurosci Methods. 2019 Apr 1;317:37-44. doi: 10.1016/j.jneumeth.2019.01.006. Epub 2019 Jan 18.

DOI:10.1016/j.jneumeth.2019.01.006
PMID:30664916
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8930195/
Abstract

BACKGROUND

Reinforcement learning models provide excellent descriptions of learning in multiple species across a variety of tasks. Many researchers are interested in relating parameters of reinforcement learning models to neural measures, psychological variables or experimental manipulations. We demonstrate that parameter identification is difficult because a range of parameter values provide approximately equal quality fits to data. This identification problem has a large impact on power: we show that a researcher who wants to detect a medium sized correlation (r = .3) with 80% power between a variable and learning rate must collect 60% more subjects than specified by a typical power analysis in order to account for the noise introduced by model fitting.

NEW METHOD

We derive a Bayesian optimal model fitting technique that takes advantage of information contained in choices and reaction times to constrain parameter estimates.

RESULTS

We show using simulation and empirical data that this method substantially improves the ability to recover learning rates.

COMPARISON WITH EXISTING METHODS

We compare this method against the use of Bayesian priors. We show in simulations that the combined use of Bayesian priors and reaction times confers the highest parameter identifiability. However, in real data where the priors may have been misspecified, the use of Bayesian priors interferes with the ability of reaction time data to improve parameter identifiability.

CONCLUSIONS

We present a simple technique that takes advantage of readily available data to substantially improve the quality of inferences that can be drawn from parameters of reinforcement learning models.

摘要

背景

强化学习模型为多种任务中的多种物种的学习提供了出色的描述。许多研究人员都有兴趣将强化学习模型的参数与神经测量、心理变量或实验操作联系起来。我们证明了参数识别是困难的,因为一系列参数值可以提供几乎相同质量的数据拟合。这个识别问题对功率有很大的影响:我们表明,希望检测变量与学习率之间具有中等相关(r=0.3)的研究人员必须收集比典型功率分析指定的多 60%的样本,以考虑到模型拟合引入的噪声。

新方法

我们推导出一种贝叶斯最优模型拟合技术,该技术利用选择和反应时间中的信息来约束参数估计。

结果

我们通过模拟和经验数据表明,该方法大大提高了恢复学习率的能力。

与现有方法的比较

我们将这种方法与贝叶斯先验的使用进行了比较。我们在模拟中表明,贝叶斯先验和反应时间的结合使用赋予了最高的参数可识别性。然而,在可能存在先验误置的真实数据中,贝叶斯先验的使用会干扰反应时间数据提高参数可识别性的能力。

结论

我们提出了一种简单的技术,利用现成的数据,大大提高了从强化学习模型参数中得出推论的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/102bfd268dd0/nihms-1784112-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/ad7598655efc/nihms-1784112-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/6f2c47279586/nihms-1784112-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/30295b91946d/nihms-1784112-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/102bfd268dd0/nihms-1784112-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/ad7598655efc/nihms-1784112-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/6f2c47279586/nihms-1784112-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/30295b91946d/nihms-1784112-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0322/8930195/102bfd268dd0/nihms-1784112-f0004.jpg

相似文献

1
Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models.联合建模反应时间和选择可提高强化学习模型的参数可识别性。
J Neurosci Methods. 2019 Apr 1;317:37-44. doi: 10.1016/j.jneumeth.2019.01.006. Epub 2019 Jan 18.
2
The relative merit of empirical priors in non-identifiable and sloppy models: Applications to models of learning and decision-making : Empirical priors.不可识别和粗略模型中经验先验的相对优势:在学习和决策模型中的应用:经验先验。
Psychon Bull Rev. 2018 Dec;25(6):2047-2068. doi: 10.3758/s13423-018-1446-5.
3
ASAS-NANP symposium: Mathematical Modeling in Animal Nutrition: The power of identifiability analysis for dynamic modeling in animal science:a practitioner approach.ASAS-NANP 研讨会:动物营养中的数学建模:可识别性分析在动物科学动态建模中的作用:一种实践者的方法。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad320.
4
Dynamic models of choice.动态选择模型。
Behav Res Methods. 2019 Apr;51(2):961-985. doi: 10.3758/s13428-018-1067-y.
5
The drift diffusion model as the choice rule in reinforcement learning.强化学习中的选择规则——漂移扩散模型。
Psychon Bull Rev. 2017 Aug;24(4):1234-1251. doi: 10.3758/s13423-016-1199-y.
6
Dopaminergic Modulation of Human Intertemporal Choice: A Diffusion Model Analysis Using the D2-Receptor Antagonist Haloperidol.多巴胺能调节人类跨期选择:使用 D2 受体拮抗剂氟哌啶醇的扩散模型分析。
J Neurosci. 2020 Oct 7;40(41):7936-7948. doi: 10.1523/JNEUROSCI.0592-20.2020. Epub 2020 Sep 18.
7
Simultaneous Hierarchical Bayesian Parameter Estimation for Reinforcement Learning and Drift Diffusion Models: a Tutorial and Links to Neural Data.强化学习和漂移扩散模型的同步分层贝叶斯参数估计:教程及与神经数据的链接
Comput Brain Behav. 2020 Dec;3(4):458-471. doi: 10.1007/s42113-020-00084-w. Epub 2020 May 26.
8
The drift diffusion model as the choice rule in inter-temporal and risky choice: A case study in medial orbitofrontal cortex lesion patients and controls.时际与风险选择中的漂移扩散模型作为选择规则:内侧眶额皮层损伤患者与对照的病例研究。
PLoS Comput Biol. 2020 Apr 20;16(4):e1007615. doi: 10.1371/journal.pcbi.1007615. eCollection 2020 Apr.
9
A reinforcement learning diffusion decision model for value-based decisions.基于价值的决策的强化学习扩散决策模型。
Psychon Bull Rev. 2019 Aug;26(4):1099-1121. doi: 10.3758/s13423-018-1554-2.
10
Assessing the practical differences between model selection methods in inferences about choice response time tasks.评估选择反应时任务推理中模型选择方法之间的实际差异。
Psychon Bull Rev. 2019 Aug;26(4):1070-1098. doi: 10.3758/s13423-018-01563-9.

引用本文的文献

1
Further examining how animals weigh conflicting information about reward sources over time.进一步研究动物如何随着时间推移权衡来自奖励源的相互冲突的信息。
Anim Cogn. 2025 Jul 30;28(1):74. doi: 10.1007/s10071-025-01982-x.
2
Bayesian Workflow for Generative Modeling in Computational Psychiatry.计算精神病学中生成模型的贝叶斯工作流程
Comput Psychiatr. 2025 Mar 25;9(1):76-99. doi: 10.5334/cpsy.116. eCollection 2025.
3
A dopaminergic basis of behavioral control.行为控制的多巴胺能基础。

本文引用的文献

1
The relative merit of empirical priors in non-identifiable and sloppy models: Applications to models of learning and decision-making : Empirical priors.不可识别和粗略模型中经验先验的相对优势:在学习和决策模型中的应用:经验先验。
Psychon Bull Rev. 2018 Dec;25(6):2047-2068. doi: 10.3758/s13423-018-1446-5.
2
Dissociable effects of surprising rewards on learning and memory.意外奖励对学习和记忆的不同影响。
J Exp Psychol Learn Mem Cogn. 2018 Sep;44(9):1430-1443. doi: 10.1037/xlm0000518. Epub 2018 Mar 19.
3
Frontostriatal and Dopamine Markers of Individual Differences in Reinforcement Learning: A Multi-modal Investigation.
bioRxiv. 2024 Oct 2:2024.09.17.613524. doi: 10.1101/2024.09.17.613524.
4
High stakes slow responding, but do not help overcome Pavlovian biases in humans.高风险反应慢,但无助于克服人类的巴甫洛夫偏见。
Learn Mem. 2024 Sep 16;31(8). doi: 10.1101/lm.054017.124. Print 2024 Aug.
5
Cardiac-Sympathetic Contractility and Neural Alpha-Band Power: Cross-Modal Collaboration during Approach-Avoidance Conflict.心脏交感收缩力和神经α波段功率:趋近-回避冲突中的跨模态协作。
J Neurosci. 2024 Oct 9;44(41):e2008232024. doi: 10.1523/JNEUROSCI.2008-23.2024.
6
Potential association between suicide risk, aggression, impulsivity, and the somatosensory system.自杀风险、攻击性、冲动性与躯体感觉系统之间的潜在关联。
Soc Cogn Affect Neurosci. 2024 Jul 2;19(1). doi: 10.1093/scan/nsae041.
7
Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging.通过漂移扩散模型和功能磁共振成像分解无序赌博中强化学习缺陷
Comput Psychiatr. 2024 Mar 20;8(1):23-45. doi: 10.5334/cpsy.104. eCollection 2024.
8
Human decision making balances reward maximization and policy compression.人类决策平衡了奖励最大化和策略压缩。
PLoS Comput Biol. 2024 Apr 26;20(4):e1012057. doi: 10.1371/journal.pcbi.1012057. eCollection 2024 Apr.
9
Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.主动强化学习与动作偏差和滞后的比较:混合专家与非专家的控制。
PLoS Comput Biol. 2024 Mar 29;20(3):e1011950. doi: 10.1371/journal.pcbi.1011950. eCollection 2024 Mar.
10
Test-retest reliability of reinforcement learning parameters.再测试学习参数的可靠性。
Behav Res Methods. 2024 Aug;56(5):4582-4599. doi: 10.3758/s13428-023-02203-4. Epub 2023 Sep 8.
前额叶纹状体和多巴胺对强化学习个体差异的标志物:一项多模态研究。
Cereb Cortex. 2018 Dec 1;28(12):4281-4290. doi: 10.1093/cercor/bhx281.
4
More Is Meaningful: The Magnitude Effect in Intertemporal Choice Depends on Self-Control.更多就是更有意义:跨期选择中的量值效应取决于自我控制。
Psychol Sci. 2017 Oct;28(10):1443-1454. doi: 10.1177/0956797617711455. Epub 2017 Aug 31.
5
Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.多维环境中强化学习与注意力之间的动态交互
Neuron. 2017 Jan 18;93(2):451-463. doi: 10.1016/j.neuron.2016.12.040.
6
Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning.杏仁核与腹侧纹状体对强化学习有不同贡献。
Neuron. 2016 Oct 19;92(2):505-517. doi: 10.1016/j.neuron.2016.09.025. Epub 2016 Oct 6.
7
Characterizing a psychiatric symptom dimension related to deficits in goal-directed control.表征与目标导向控制缺陷相关的一种精神症状维度。
Elife. 2016 Mar 1;5:e11305. doi: 10.7554/eLife.11305.
8
Neural underpinnings of the evidence accumulator.证据积累器的神经基础。
Curr Opin Neurobiol. 2016 Apr;37:149-157. doi: 10.1016/j.conb.2016.01.003. Epub 2016 Feb 12.
9
Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data.基于模型的神经影像学方法:将强化学习理论与 fMRI 数据相结合。
Wiley Interdiscip Rev Cogn Sci. 2010 Jul;1(4):501-510. doi: 10.1002/wcs.57. Epub 2010 Apr 2.
10
Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power-Commentary on Vul et al. (2009).小研究中的大关联:夸大的 fMRI 关联反映出低统计功效——述评 Vul 等人(2009 年)。
Perspect Psychol Sci. 2009 May;4(3):294-8. doi: 10.1111/j.1745-6924.2009.01127.x.