Department of Psychiatry, Dalhousie University, Halifax, NS, Canada.
Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada.
Psychol Med. 2021 Dec;51(16):2742-2751. doi: 10.1017/S0033291721003871. Epub 2021 Oct 12.
Multiple treatments are effective for major depressive disorder (MDD), but the outcomes of each treatment vary broadly among individuals. Accurate prediction of outcomes is needed to help select a treatment that is likely to work for a given person. We aim to examine the performance of machine learning methods in delivering replicable predictions of treatment outcomes.
Of 7732 non-duplicate records identified through literature search, we retained 59 eligible reports and extracted data on sample, treatment, predictors, machine learning method, and treatment outcome prediction. A minimum sample size of 100 and an adequate validation method were used to identify adequate-quality studies. The effects of study features on prediction accuracy were tested with mixed-effects models. Fifty-four of the studies provided accuracy estimates or other estimates that allowed calculation of balanced accuracy of predicting outcomes of treatment.
Eight adequate-quality studies reported a mean accuracy of 0.63 [95% confidence interval (CI) 0.56-0.71], which was significantly lower than a mean accuracy of 0.75 (95% CI 0.72-0.78) in the other 46 studies. Among the adequate-quality studies, accuracies were higher when predicting treatment resistance (0.69) and lower when predicting remission (0.60) or response (0.56). The choice of machine learning method, feature selection, and the ratio of features to individuals were not associated with reported accuracy.
The negative relationship between study quality and prediction accuracy, combined with a lack of independent replication, invites caution when evaluating the potential of machine learning applications for personalizing the treatment of depression.
对于重度抑郁症(MDD),有多种治疗方法有效,但每种治疗方法在个体中的效果差异很大。需要准确预测治疗效果,以帮助选择对特定患者有效的治疗方法。我们旨在研究机器学习方法在提供可复制的治疗效果预测方面的性能。
通过文献检索,共确定了 7732 个非重复记录,我们保留了 59 项符合条件的报告,并提取了关于样本、治疗、预测因素、机器学习方法和治疗效果预测的数据。使用最小样本量为 100 和充分的验证方法来识别高质量的研究。使用混合效应模型测试了研究特征对预测准确性的影响。54 项研究提供了准确性估计值或其他估计值,允许计算预测治疗效果的平衡准确性。
8 项高质量研究报告的平均准确性为 0.63(95%置信区间 0.56-0.71),明显低于其他 46 项研究的平均准确性 0.75(95%置信区间 0.72-0.78)。在高质量研究中,预测治疗抵抗的准确性较高(0.69),预测缓解(0.60)或反应(0.56)的准确性较低。机器学习方法的选择、特征选择以及特征与个体的比例与报告的准确性无关。
研究质量与预测准确性之间的负相关关系,加上缺乏独立复制,在评估机器学习应用于个性化抑郁症治疗的潜力时需要谨慎。