Massell Johannes, Preisig Martin, Miché Marcel, Strippoli Marie-Pierre F, Pistis Giorgio, Lieb Roselind
Division of Clinical Psychology and Epidemiology, Department of Psychology, University of Basel, Missionsstrasse 62a, Basel, 4055, Switzerland.
Psychiatric Epidemiology and Psychopathology Research Center, Department of Psychiatry, Lausanne University Hospital and University of Lausanne, Rte de Cery 25, 1008 Prilly, Switzerland.
Soc Psychiatry Psychiatr Epidemiol. 2025 Jun 18. doi: 10.1007/s00127-025-02942-z.
In this paper we leverage machine learning (ML) models to prospectively predict the first onset of Major Depressive Disorder (MDD), one of the most common and disabling mental health conditions. While such prediction models hold potential for enabling early interventions, few studies have applied ML approaches to this task, and those that have are heterogeneous in nature. Moreover, the clinical utility of these predictive models remains largely unexamined.
Data stemmed from CoLaus|PsyCoLaus, a population-based cohort study. In total, 1350 participants, age 35-66 years without lifetime MDD at baseline participated in the physical and psychiatric baseline and at least one psychiatric follow-up evaluation. Models based on logistic regression, elastic net, random forests, and XGBoost were trained using an extensive array of psychosocial, environmental, biological, and genetic predictors. Discriminative performance, calibration, clinical utility, and individual predictor contributions were assessed using nested cross-validation.
Discriminative performance was comparable between models (areas under the precision-recall curve between 0.36 and 0.38; areas under the receiver operating characteristic curve between 0.65 and 0.68). Decision curve analysis suggested clinical utility of logistic regression, elastic net, and random forests for threshold probabilities between 10% and 40%. Across all models, neuroticism, sex, and age were the most important predictors.
Although the prediction models achieved discriminative performance levels above chance, further refinement is necessary. The addition of biological and genetic predictors did not elevate performance markedly. Additional research seems warranted given the limited number and heterogeneous nature of existing studies, the burden associated with MDD, and the potential to improve overall outcomes for people at risk for MDD.
在本文中,我们利用机器学习(ML)模型前瞻性地预测重度抑郁症(MDD)的首次发作,MDD是最常见且使人丧失能力的心理健康状况之一。虽然此类预测模型具有实现早期干预的潜力,但很少有研究将ML方法应用于该任务,而且已开展的研究在本质上也各不相同。此外,这些预测模型的临床实用性在很大程度上仍未得到检验。
数据源自CoLaus|PsyCoLaus,这是一项基于人群的队列研究。共有1350名年龄在35 - 66岁之间、基线时无终生MDD的参与者参加了身体和精神科基线检查以及至少一次精神科随访评估。使用一系列广泛的社会心理、环境、生物和遗传预测因素对基于逻辑回归、弹性网络、随机森林和XGBoost的模型进行训练。使用嵌套交叉验证评估判别性能、校准、临床实用性和个体预测因素的贡献。
各模型之间的判别性能相当(精确召回率曲线下面积在0.36至0.38之间;受试者工作特征曲线下面积在0.65至0.68之间)。决策曲线分析表明,对于阈值概率在10%至40%之间的情况,逻辑回归、弹性网络和随机森林具有临床实用性。在所有模型中,神经质、性别和年龄是最重要的预测因素。
尽管预测模型的判别性能水平高于随机水平,但仍需进一步完善。添加生物和遗传预测因素并未显著提高性能。鉴于现有研究数量有限且性质各异、MDD带来的负担以及改善MDD风险人群总体结局的潜力,似乎有必要开展更多研究。