Suppr超能文献

预测中年人群的自杀行为:英国生物库的机器学习分析。

Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank.

机构信息

West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.

Med-X Center for Informatics, Sichuan University, Chengdu, China.

出版信息

JMIR Public Health Surveill. 2023 Feb 20;9:e43419. doi: 10.2196/43419.

Abstract

BACKGROUND

Suicidal behaviors, including suicide deaths and attempts, are major public health concerns. However, previous suicide models required a huge amount of input features, resulting in limited applicability in clinical practice.

OBJECTIVE

We aimed to construct applicable models (ie, with limited features) for short- and long-term suicidal behavior prediction. We further validated these models among individuals with different genetic risks of suicide.

METHODS

Based on the prospective cohort of UK Biobank, we included 223 (0.06%) eligible cases of suicide attempts or deaths, according to hospital inpatient or death register data within 1 year from baseline and randomly selected 4460 (1.18%) controls (1:20) without such records. We similarly identified 833 (0.22%) cases of suicidal behaviors 1 to 6 years from baseline and 16,660 (4.42%) corresponding controls. Based on 143 input features, mainly including sociodemographic, environmental, and psychosocial factors; medical history; and polygenic risk scores (PRS) for suicidality, we applied a bagged balanced light gradient-boosting machine (LightGBM) with stratified 10-fold cross-validation and grid-search to construct the full prediction models for suicide attempts or deaths within 1 year or between 1 and 6 years. The Shapley Additive Explanations (SHAP) approach was used to quantify the importance of input features, and the top 20 features with the highest SHAP values were selected to train the applicable models. The external validity of the established models was assessed among 50,310 individuals who participated in UK Biobank repeated assessments both overall and by the level of PRS for suicidality.

RESULTS

Individuals with suicidal behaviors were on average 56 years old, with equal sex distribution. The application of these full models in the external validation data set demonstrated good model performance, with the area under the receiver operating characteristic (AUROC) curves of 0.919 and 0.892 within 1 year and between 1 and 6 years, respectively. Importantly, the applicable models with the top 20 most important features showed comparable external-validated performance (AUROC curves of 0.901 and 0.885) as the full models, based on which we found that individuals in the top quintile of predicted risk accounted for 91.7% (n=11) and 80.7% (n=25) of all suicidality cases within 1 year and during 1 to 6 years, respectively. We further obtained comparable prediction accuracy when applying these models to subpopulations with different genetic susceptibilities to suicidality. For example, for the 1-year risk prediction, the AUROC curves were 0.907 and 0.885 for the high (>2nd tertile of PRS) and low (<1st) genetic susceptibilities groups, respectively.

CONCLUSIONS

We established applicable machine learning-based models for predicting both the short- and long-term risk of suicidality with high accuracy across populations of varying genetic risk for suicide, highlighting a cost-effective method of identifying individuals with a high risk of suicidality.

摘要

背景

自杀行为,包括自杀死亡和自杀未遂,是重大的公共卫生问题。然而,之前的自杀模型需要大量的输入特征,因此在临床实践中的适用性有限。

目的

我们旨在构建适用于短期和长期自杀行为预测的模型(即具有有限特征)。我们进一步在具有不同自杀遗传风险的个体中验证了这些模型。

方法

基于英国生物库的前瞻性队列,我们根据医院住院或死亡登记数据,在基线后 1 年内纳入了 223 例(0.06%)自杀未遂或死亡的合格病例,并随机选择了 4460 例(1.18%)无此类记录的对照(1:20)。我们同样确定了 833 例(0.22%)自杀行为 1 至 6 年的病例和 16660 例(4.42%)相应的对照。基于 143 个输入特征,主要包括社会人口统计学、环境和心理社会因素;病史;以及自杀性的多基因风险评分(PRS),我们应用袋装平衡轻梯度提升机(LightGBM)进行分层 10 折交叉验证和网格搜索,构建了 1 年内或 1 至 6 年内自杀未遂或死亡的全预测模型。Shapley Additive Explanations(SHAP)方法用于量化输入特征的重要性,选择前 20 个具有最高 SHAP 值的特征来训练适用模型。我们通过英国生物库的所有参与者和自杀性 PRS 水平的重复评估来评估所建立模型的外部有效性。

结果

自杀行为者的平均年龄为 56 岁,性别分布均衡。在外部验证数据集中应用这些全模型显示出良好的模型性能,在 1 年内和 1 至 6 年内的受试者工作特征(ROC)曲线下面积分别为 0.919 和 0.892。重要的是,基于前 20 个最重要特征的适用模型表现出与全模型相当的外部验证性能(ROC 曲线下面积分别为 0.901 和 0.885),基于此,我们发现预测风险最高的五分位数人群中,分别有 91.7%(n=11)和 80.7%(n=25)的人在 1 年内和 1 至 6 年内发生自杀。我们还在具有不同自杀遗传易感性的亚人群中应用这些模型时获得了可比的预测准确性。例如,对于 1 年风险预测,高(PRS>第 2 tertile)和低(PRS<第 1 tertile)遗传易感性组的 ROC 曲线分别为 0.907 和 0.885。

结论

我们建立了基于机器学习的适用模型,用于预测短期和长期自杀风险,在具有不同自杀遗传风险的人群中具有较高的准确性,突出了一种具有成本效益的方法,可以识别自杀风险较高的个体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a67e/9989910/3a00e0c342c4/publichealth_v9i1e43419_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验