• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于通过强化学习实验发现决策动态的隐马尔可夫模型。

HMM for discovering decision-making dynamics using reinforcement learning experiments.

作者信息

Guo Xingche, Zeng Donglin, Wang Yuanjia

机构信息

Department of Biostatistics, Columbia University, 722 West 168th St, New York, NY, 10032, United States.

Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109, United States.

出版信息

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.

DOI:10.1093/biostatistics/kxae033
PMID:39226534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12090054/
Abstract

Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

摘要

重度抑郁症(MDD)是导致多年残疾生活的主要原因之一,由于其性质复杂且具有异质性,在诊断和治疗方面存在挑战。新出现的证据表明,奖赏处理异常可能是MDD的行为标志物。为了测量奖赏处理,患者要执行基于计算机的行为任务,这些任务涉及做出选择或对与不同结果(如实验室中的收益或损失)相关的刺激做出反应。强化学习(RL)模型被用于提取测量奖赏处理各个方面(如奖赏敏感性)的参数,以描述患者在行为任务中如何做出决策。最近的研究结果表明,仅基于单一的RL模型来描述奖赏学习是不够的;相反,在多种策略之间可能存在决策过程的切换。一个重要的科学问题是决策策略的动态变化如何影响MDD患者的奖赏学习能力。受临床护理中抗抑郁反应的调节因素和生物标志物确立(EMBARC)研究中的概率奖赏任务的启发,我们提出了一种用于分析基于奖赏的决策的新型RL - 隐马尔可夫模型(HMM)框架。我们的模型在HMM下考虑了两种不同方法之间的决策策略切换:受试者基于RL模型做出决策或选择随机选择。我们考虑了连续的RL状态空间,并允许HMM中的转移概率随时间变化。我们引入了一种计算效率高的期望最大化(EM)算法进行参数估计,并使用非参数自助法进行推断。广泛的模拟研究验证了我们方法的有限样本性能。我们将我们的方法应用于EMBARC研究,以表明与健康对照组相比,MDD患者较少参与RL,并且参与度与情绪冲突任务期间负性情绪回路中的大脑活动相关。

相似文献

1
HMM for discovering decision-making dynamics using reinforcement learning experiments.用于通过强化学习实验发现决策动态的隐马尔可夫模型。
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.
2
A Semiparametric Inverse Reinforcement Learning Approach to Characterize Decision Making for Mental Disorders.一种用于刻画精神障碍决策制定的半参数逆强化学习方法。
J Am Stat Assoc. 2024;119(545):27-38. doi: 10.1080/01621459.2023.2261184. Epub 2023 Nov 22.
3
Dorsal-Ventral Reinforcement Learning Network Connectivity and Incentive-Driven Changes in Exploration.背腹侧强化学习网络连接性与探索中动机驱动的变化
J Neurosci. 2025 Apr 9;45(15):e0422242025. doi: 10.1523/JNEUROSCI.0422-24.2025.
4
Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units.重症监护病房中智能机械通气和镇静药物剂量的逆强化学习。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):57. doi: 10.1186/s12911-019-0763-6.
5
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
6
Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning.探究强化学习与简单行为策略之间的关系,以理解概率性奖励学习。
J Neurosci Methods. 2020 Jul 15;341:108777. doi: 10.1016/j.jneumeth.2020.108777. Epub 2020 May 15.
7
Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning.在强化学习过程中,重度抑郁症患者内部奖励预测误差信号保留但外部奖励预测误差信号受损。
Depress Anxiety. 2017 Jan;34(1):89-96. doi: 10.1002/da.22576. Epub 2016 Oct 26.
8
Association of Neural and Emotional Impacts of Reward Prediction Errors With Major Depression.奖励预测误差的神经和情绪影响与重度抑郁症的关联。
JAMA Psychiatry. 2017 Aug 1;74(8):790-797. doi: 10.1001/jamapsychiatry.2017.1713.
9
Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling.使用扩散决策模型对强化学习过程中上下文效价和反馈信息对速度和准确性的影响进行分解:一项元分析方法。
Cogn Affect Behav Neurosci. 2019 Jun;19(3):490-502. doi: 10.3758/s13415-019-00723-1.
10
Abnormal approach-related motivation but spared reinforcement learning in MDD: Evidence from fronto-midline Theta oscillations and frontal Alpha asymmetry.抑郁障碍中异常趋近相关动机但保留强化学习:来自额中线Theta 振荡和额侧 Alpha 不对称性的证据。
Cogn Affect Behav Neurosci. 2019 Jun;19(3):759-777. doi: 10.3758/s13415-019-00693-4.

本文引用的文献

1
A Semiparametric Inverse Reinforcement Learning Approach to Characterize Decision Making for Mental Disorders.一种用于刻画精神障碍决策制定的半参数逆强化学习方法。
J Am Stat Assoc. 2024;119(545):27-38. doi: 10.1080/01621459.2023.2261184. Epub 2023 Nov 22.
2
Mice alternate between discrete strategies during perceptual decision-making.小鼠在感知决策过程中会在不同策略之间交替。
Nat Neurosci. 2022 Feb;25(2):201-212. doi: 10.1038/s41593-021-01007-z. Epub 2022 Feb 7.
3
Sex differences in learning from exploration.从探索中学习的性别差异。
Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.
4
Brain regulation of emotional conflict predicts antidepressant treatment response for depression.大脑对情绪冲突的调节可预测抗抑郁药治疗抑郁症的反应。
Nat Hum Behav. 2019 Dec;3(12):1319-1331. doi: 10.1038/s41562-019-0732-1. Epub 2019 Sep 23.
5
An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals.长的间隔测验后,血清素刺激对奖赏学习率的影响明显。
Nat Commun. 2018 Jun 26;9(1):2477. doi: 10.1038/s41467-018-04840-2.
6
Precision psychiatry: a neural circuit taxonomy for depression and anxiety.精准精神病学:抑郁症和焦虑症的神经回路分类法
Lancet Psychiatry. 2016 May;3(5):472-80. doi: 10.1016/S2215-0366(15)00579-9. Epub 2016 Apr 14.
7
Establishing moderators and biosignatures of antidepressant response in clinical care (EMBARC): Rationale and design.在临床护理中确定抗抑郁药反应的调节因素和生物标志物(EMBARC):原理与设计
J Psychiatr Res. 2016 Jul;78:11-23. doi: 10.1016/j.jpsychires.2016.03.001. Epub 2016 Mar 15.
8
Computational psychiatry as a bridge from neuroscience to clinical applications.计算精神病学作为从神经科学通向临床应用的桥梁。
Nat Neurosci. 2016 Mar;19(3):404-13. doi: 10.1038/nn.4238.
9
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis.将快感缺失映射到强化学习:一项行为荟萃分析。
Biol Mood Anxiety Disord. 2013 Jun 19;3(1):12. doi: 10.1186/2045-5380-3-12.
10
Heterogeneity of strategy use in the Iowa gambling task: a comparison of win-stay/lose-shift and reinforcement learning models.策略使用的异质性在爱荷华赌博任务中:赢留输变和强化学习模型的比较。
Psychon Bull Rev. 2013 Apr;20(2):364-71. doi: 10.3758/s13423-012-0324-9.