Spittal Matthew J, Guo Xianglin Aneta, Kang Laurant, Kirtley Olivia J, Clapperton Angela, Hawton Keith, Kapur Nav, Pirkis Jane, Carter Greg
Centre for Mental Health and Community Wellbeing, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia.
Hunter New England Local Health District, Waratah, Australia.
PLoS Med. 2025 Sep 11;22(9):e1004581. doi: 10.1371/journal.pmed.1004581. eCollection 2025 Sep.
There has been rapid expansion in the development of machine learning algorithms to predict suicidal behaviours. To test the accuracy of these algorithms for predicting suicide and hospital-treated self-harm, we undertook a systematic review and meta-analysis. The study was registered (PROSPERO CRD42024523074).
We searched PubMed, PsycINFO, Scopus, EMBASE, IEEE, Medline, CINALH and Web of Science from database inception until 30 April 2025 to identify studies using machine learning algorithms to predict suicide, self-harm and a combined suicide/self-harm outcome. Studies were included if they examined suicide or hospital-treated self-harm outcomes using a case-control, case-cohort or cohort study design. Studies were excluded if they used self-reported outcomes or examined outcomes using other study designs. Accuracy was assessed using statistical methods appropriate for diagnostic accuracy studies. Fifty-three studies met the inclusion criteria. The area under the receiver operating characteristic curves ranged from 0.69 to 0.93. Sensitivity was 45%-82% and specificity was 91%-95%. Positive likelihood ratios were 6.5-9.9 and negative likelihood values were 0.2-0.6. Using in-sample prevalence values, the positive predictive values ranged from 6% to 17%. Using out-of-sample prevalence values at an LR+ value of 10, the positive predictive value was 0.1% in low prevalence populations, 17% in medium prevalence populations and 66% in high prevalence populations. The main study limitations were the exclusion of relevant studies where we could not extract sufficient information to calculate accuracy statistics and between-study differences in the follow-up time over which the outcomes were observed.
The accuracy of machine learning algorithms for predicting suicidal behaviour is too low to be useful for screening (case finding) or for prioritising high-risk individuals for interventions (treatment allocation). For hospital-treated self-harm populations, management should instead include three components for all patients: a needs-based assessment and response, identification of modifiable risk factors with treatment intended to reduce those exposures, and implementation of demonstrated effective aftercare interventions.
用于预测自杀行为的机器学习算法发展迅速。为了测试这些算法预测自杀和医院治疗的自我伤害的准确性,我们进行了一项系统综述和荟萃分析。该研究已注册(PROSPERO CRD42024523074)。
我们检索了PubMed、PsycINFO、Scopus、EMBASE、IEEE、Medline、CINALH和Web of Science数据库,检索时间从数据库建立至2025年4月30日,以识别使用机器学习算法预测自杀、自我伤害以及自杀/自我伤害综合结果的研究。如果研究采用病例对照、病例队列或队列研究设计来检查自杀或医院治疗的自我伤害结果,则纳入研究。如果研究使用自我报告的结果或采用其他研究设计来检查结果,则排除该研究。使用适合诊断准确性研究的统计方法评估准确性。53项研究符合纳入标准。受试者工作特征曲线下面积范围为0.69至0.93。敏感性为45% - 82%,特异性为91% - 95%。阳性似然比为6.5 - 9.9,阴性似然值为0.2 - 0.6。使用样本内患病率值,阳性预测值范围为6%至17%。在LR +值为10时使用样本外患病率值,低患病率人群的阳性预测值为0.1%,中等患病率人群为17%,高患病率人群为66%。主要研究局限性在于排除了我们无法提取足够信息来计算准确性统计数据的相关研究,以及观察结果的随访时间存在研究间差异。
机器学习算法预测自杀行为的准确性过低,无法用于筛查(病例发现)或对高风险个体进行干预优先级排序(治疗分配)。对于医院治疗的自我伤害人群,管理应包括针对所有患者的三个组成部分:基于需求的评估和应对、识别可改变的风险因素并进行旨在减少这些暴露的治疗,以及实施已证明有效的后续护理干预措施。