Altrecht mental health Centre, Utrecht, the Netherlands.
University medical center Utrecht, departement of psychiatry, Utrecht, the Netherlands.
J Affect Disord. 2020 Jun 15;271:169-177. doi: 10.1016/j.jad.2020.03.081. Epub 2020 Apr 18.
The predictive accuracy of suicidal behaviour has not improved over the last decades. We aimed to explore the potential of machine learning to predict future suicidal behaviour using population-based longitudinal data.
Baseline risk data assessed within the Scottish wellbeing study, in which 3508 young adults (18-34 years) completed a battery of psychological measures, were used to predict both suicide ideation and suicide attempts at one-year follow-up. The performance of the following algorithms was compared: regular logistic regression, K-nearest neighbors, classification tree, random forests, gradient boosting and support vector machine.
At one year follow up, 2428 respondents (71%) finished the second assessment. 336 respondents (14%) reported suicide ideation between baseline and follow up, and 50 (2%) reported a suicide attempt. All performance metrics were highly similar across methods. The random forest algorithm was the best algorithm to predict suicide ideation (AUC 0.83, PPV 0.52, BA 0.74) and the gradient boosting to predict suicide attempt (AUC 0.80, PPV 0.10, BA 0.69).
The number of respondents with suicidal behaviour at follow up was small. We only had data on psychological risk factors, limiting the potential of the more complex machine learning algorithms to outperform regular logistical regression.
When applied to population-based longitudinal data containing multiple psychological measurements, machine learning techniques did not significantly improve the predictive accuracy of suicidal behaviour. Adding more detailed data on for example employment, education or previous health care uptake, might result in better performance of machine learning over regular logistical regression.
在过去的几十年中,自杀行为的预测准确性并没有提高。我们旨在探索使用基于人群的纵向数据,通过机器学习预测未来自杀行为的潜力。
使用苏格兰幸福感研究中的基线风险数据,该研究共纳入 3508 名 18-34 岁的年轻人,完成了一系列心理测量,用于预测一年随访时的自杀意念和自杀企图。比较了以下算法的性能:常规逻辑回归、K-最近邻、分类树、随机森林、梯度提升和支持向量机。
在一年随访时,2428 名受访者(71%)完成了第二次评估。336 名受访者(14%)在基线和随访之间报告了自杀意念,50 名(2%)报告了自杀企图。所有性能指标在各种方法之间都非常相似。随机森林算法是预测自杀意念的最佳算法(AUC 0.83、PPV 0.52、BA 0.74),梯度提升是预测自杀企图的最佳算法(AUC 0.80、PPV 0.10、BA 0.69)。
随访时自杀行为的受访者数量较少。我们仅拥有心理风险因素的数据,限制了更复杂的机器学习算法超过常规逻辑回归的潜力。
当应用于包含多个心理测量的基于人群的纵向数据时,机器学习技术并未显著提高自杀行为的预测准确性。添加更详细的数据,例如就业、教育或以前的医疗保健利用率,可能会导致机器学习相对于常规逻辑回归的性能更好。