Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, New York, United States of America.
PLoS Med. 2020 Nov 6;17(11):e1003416. doi: 10.1371/journal.pmed.1003416. eCollection 2020 Nov.
BACKGROUND: Suicide is a major public health concern globally. Accurately predicting suicidal behavior remains challenging. This study aimed to use machine learning approaches to examine the potential of the Swedish national registry data for prediction of suicidal behavior. METHODS AND FINDINGS: The study sample consisted of 541,300 inpatient and outpatient visits by 126,205 Sweden-born patients (54% female and 46% male) aged 18 to 39 (mean age at the visit: 27.3) years to psychiatric specialty care in Sweden between January 1, 2011 and December 31, 2012. The most common psychiatric diagnoses at the visit were anxiety disorders (20.0%), major depressive disorder (16.9%), and substance use disorders (13.6%). A total of 425 candidate predictors covering demographic characteristics, socioeconomic status (SES), electronic medical records, criminality, as well as family history of disease and crime were extracted from the Swedish registry data. The sample was randomly split into an 80% training set containing 433,024 visits and a 20% test set containing 108,276 visits. Models were trained separately for suicide attempt/death within 90 and 30 days following a visit using multiple machine learning algorithms. Model discrimination and calibration were both evaluated. Among all eligible visits, 3.5% (18,682) were followed by a suicide attempt/death within 90 days and 1.7% (9,099) within 30 days. The final models were based on ensemble learning that combined predictions from elastic net penalized logistic regression, random forest, gradient boosting, and a neural network. The area under the receiver operating characteristic (ROC) curves (AUCs) on the test set were 0.88 (95% confidence interval [CI] = 0.87-0.89) and 0.89 (95% CI = 0.88-0.90) for the outcome within 90 days and 30 days, respectively, both being significantly better than chance (i.e., AUC = 0.50) (p < 0.01). Sensitivity, specificity, and predictive values were reported at different risk thresholds. A limitation of our study is that our models have not yet been externally validated, and thus, the generalizability of the models to other populations remains unknown. CONCLUSIONS: By combining the ensemble method of multiple machine learning algorithms and high-quality data solely from the Swedish registers, we developed prognostic models to predict short-term suicide attempt/death with good discrimination and calibration. Whether novel predictors can improve predictive performance requires further investigation.
背景:自杀是一个全球性的主要公共卫生问题。准确预测自杀行为仍然具有挑战性。本研究旨在使用机器学习方法来检查瑞典国家登记数据预测自杀行为的潜力。
方法和发现:研究样本包括 2011 年 1 月 1 日至 2012 年 12 月 31 日期间在瑞典接受精神科专科治疗的 541300 名 126205 名瑞典出生患者(54%为女性,46%为男性)的 541300 次门诊和住院就诊,年龄为 18 至 39 岁(就诊时的平均年龄:27.3 岁)。就诊时最常见的精神科诊断为焦虑症(20.0%)、重性抑郁症(16.9%)和物质使用障碍(13.6%)。从瑞典登记数据中提取了 425 个候选预测因子,涵盖人口统计学特征、社会经济地位(SES)、电子病历、犯罪行为以及疾病和犯罪家族史。样本被随机分为 80%的训练集,包含 433024 次就诊,20%的测试集包含 108276 次就诊。使用多种机器学习算法分别为就诊后 90 天和 30 天内自杀企图/死亡的情况训练模型。评估了模型的区分度和校准度。在所有符合条件的就诊中,有 3.5%(18682 人)在 90 天内发生自杀企图/死亡,1.7%(9099 人)在 30 天内发生自杀企图/死亡。最终模型基于集成学习,结合了弹性网惩罚逻辑回归、随机森林、梯度提升和神经网络的预测。测试集上的接收者操作特征(ROC)曲线下面积(AUC)分别为 0.88(95%置信区间[CI]:0.87-0.89)和 0.89(95%CI:0.88-0.90),用于预测 90 天和 30 天内的结局,均显著优于机会水平(即 AUC = 0.50)(p < 0.01)。报告了不同风险阈值下的敏感性、特异性和预测值。本研究的一个局限性是我们的模型尚未经过外部验证,因此模型在其他人群中的泛化能力尚不清楚。
结论:通过结合多种机器学习算法的集成方法和仅来自瑞典登记处的高质量数据,我们开发了具有良好区分度和校准度的预测短期自杀企图/死亡的预后模型。是否有新的预测指标可以提高预测性能,这需要进一步研究。
J Am Med Inform Assoc. 2019-12-1
Clin Orthop Relat Res. 2020-9
J Child Psychol Psychiatry. 2020-12
Mayo Clin Proc Digit Health. 2023-12-26
JMIR Public Health Surveill. 2025-1-29
JAMA Psychiatry. 2020-1-1
Evid Based Ment Health. 2019-6-27
JAMA Psychiatry. 2019-6-1
Behav Sci Law. 2019-1-4
Int J Environ Res Public Health. 2018-9-17