Guo Xingche, Zeng Donglin, Wang Yuanjia
Department of Biostatistics, Columbia University, 722 West 168th St, New York, NY, 10032, United States.
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109, United States.
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.
Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.
重度抑郁症(MDD)是导致多年残疾生活的主要原因之一,由于其性质复杂且具有异质性,在诊断和治疗方面存在挑战。新出现的证据表明,奖赏处理异常可能是MDD的行为标志物。为了测量奖赏处理,患者要执行基于计算机的行为任务,这些任务涉及做出选择或对与不同结果(如实验室中的收益或损失)相关的刺激做出反应。强化学习(RL)模型被用于提取测量奖赏处理各个方面(如奖赏敏感性)的参数,以描述患者在行为任务中如何做出决策。最近的研究结果表明,仅基于单一的RL模型来描述奖赏学习是不够的;相反,在多种策略之间可能存在决策过程的切换。一个重要的科学问题是决策策略的动态变化如何影响MDD患者的奖赏学习能力。受临床护理中抗抑郁反应的调节因素和生物标志物确立(EMBARC)研究中的概率奖赏任务的启发,我们提出了一种用于分析基于奖赏的决策的新型RL - 隐马尔可夫模型(HMM)框架。我们的模型在HMM下考虑了两种不同方法之间的决策策略切换:受试者基于RL模型做出决策或选择随机选择。我们考虑了连续的RL状态空间,并允许HMM中的转移概率随时间变化。我们引入了一种计算效率高的期望最大化(EM)算法进行参数估计,并使用非参数自助法进行推断。广泛的模拟研究验证了我们方法的有限样本性能。我们将我们的方法应用于EMBARC研究,以表明与健康对照组相比,MDD患者较少参与RL,并且参与度与情绪冲突任务期间负性情绪回路中的大脑活动相关。