Walsh Colin G, Ripperger Michael, McCoy Thomas H, Castro Victor, Hu Yirui, Kirchner H Lester, Ruderfer Douglas, Perlis Roy H
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.
Department of Medicine, Vanderbilt University Medical Center, Nashville, TN.
medRxiv. 2025 May 27:2025.05.21.25328089. doi: 10.1101/2025.05.21.25328089.
As multiple strategies have emerged for managing treatment-resistant major depressive disorder, efficient identification of individuals at elevated risk for this outcome earlier in their illness course remains essential.
We extracted electronic health records data for all individuals with a diagnosis of major depressive disorder who received an index antidepressant prescription in the clinical networks of three geographically-distinct health systems - Mass General-Brigham (MGB), Vanderbilt University Medical Center (VUMC), and Geisinger Clinic (GC) - between April 1, 2004, and March 30, 2022. The primary outcome, treatment resistant depression, was defined as provision of electroconvulsive therapy, transcranial magnetic stimulation, vagus nerve stimulation, prescription of either ketamine or esketamine or monoamine oxidase inhibitors (MAOIs), or failed trials of more than two antidepressants. We applied L1-regularized regression to sociodemographic features, medications, and ICD10 diagnostic code counts to fit a model of treatment resistance in each of the three cohorts. For each, we then estimated generalizable model performance, aka external validity, across the other two cohorts. Model concordance was measured with Concordance Correlation Coefficients (CCCs) and random forest regression analyses were used to estimate importance of features predicting discordance.
Across sites, discrimination performance ranged from Area Under the Receiver Operating Characteristic curves (AUROCs) 0.58 - 0.64 on internal validation and 0.51 - 0.58 on external validation. Area Under the Precision-Recall curve (AUPRC) ranged from 0.1-0.13 on internal validation and averaged 0.07-0.13 in external validation on the same test sets held out at each site. On the same testing set, CCCs were 0.13 for the VUMC<-> MGB models, 0.18 for VUMC<->GC models, and 0.38 for MGB<-> GC models. These results indicate the MGB and GC models were better correlated, but none were well correlated. Important features predicting discordance were dominated primarily by age and secondarily coded sex.
These linear models demonstrated consistent aggregate performance and discordant individual performance across three, disparate major health systems. The inclusion of large and heterogeneous samples suggest that further improvement may require incorporation of data types beyond those readily available in EHR. Close attention to performance by key subgroups is indicated to ensure models do not perform disparately or unfairly. Prospective studies to evaluate the extent to which clinical models might improve early identification and outcomes are warranted.
随着多种治疗难治性重度抑郁症的策略不断涌现,在疾病进程早期有效识别出具有这种结果高风险的个体仍然至关重要。
我们提取了2004年4月1日至2022年3月30日期间在三个地理位置不同的医疗系统——麻省总医院布莱根分院(MGB)、范德堡大学医学中心(VUMC)和盖辛格诊所(GC)——的临床网络中接受过抗抑郁药物首次处方且被诊断为重度抑郁症的所有个体的电子健康记录数据。主要结局,即难治性抑郁症,定义为接受电休克治疗、经颅磁刺激、迷走神经刺激、开具氯胺酮或艾司氯胺酮或单胺氧化酶抑制剂(MAOI)处方,或超过两种抗抑郁药物试验失败。我们将L1正则化回归应用于社会人口学特征、药物治疗和ICD10诊断代码计数,以拟合三个队列中每个队列的治疗抵抗模型。然后,对于每个队列,我们在其他两个队列中估计模型的可推广性能,即外部有效性。使用一致性相关系数(CCC)测量模型一致性,并使用随机森林回归分析来估计预测不一致性的特征的重要性。
在各个研究点,内部验证的受试者工作特征曲线下面积(AUROC)范围为0.58 - 0.64,外部验证的范围为0.51 - 0.58。在每个研究点留出的相同测试集上,精确召回率曲线下面积(AUPRC)在内部验证中的范围为0.1 - 0.13,在外部验证中的平均值为0.07 - 0.13。在相同测试集上,VUMC与MGB模型的CCC为0.13,VUMC与GC模型的CCC为0.18,MGB与GC模型的CCC为0.38。这些结果表明MGB和GC模型的相关性更好,但均无良好相关性。预测不一致性的重要特征主要由年龄主导,其次是编码后的性别。
这些线性模型在三个不同的主要医疗系统中表现出一致的总体性能和不一致的个体性能。纳入大量且异质性的样本表明,进一步的改进可能需要纳入电子健康记录中不易获得的数据类型。建议密切关注关键亚组的性能,以确保模型不会表现出差异或不公平。有必要进行前瞻性研究,以评估临床模型在多大程度上可能改善早期识别和结局。