Suppr超能文献

利用西班牙 SEMI-COVID-19 注册研究中的机器学习改进对 COVID-19 死亡率的预测。

Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry.

机构信息

Internal Medicine Department, Infanta Cristina University Hospital, Parla, 28981, Madrid, Spain.

Department of Pediatric Endocrinology, Hospital HM Nens, HM Hospitales, 08009, Barcelona, Spain.

出版信息

Intern Emerg Med. 2023 Sep;18(6):1711-1722. doi: 10.1007/s11739-023-03338-0. Epub 2023 Jun 22.

Abstract

COVID-19 is responsible for high mortality, but robust machine learning-based predictors of mortality are lacking. To generate a model for predicting mortality in patients hospitalized with COVID-19 using Gradient Boosting Decision Trees (GBDT). The Spanish SEMI-COVID-19 registry includes 24,514 pseudo-anonymized cases of patients hospitalized with COVID-19 from 1 February 2020 to 5 December 2021. This registry was used as a GBDT machine learning model, employing the CatBoost and BorutaShap classifier to select the most relevant indicators and generate a mortality prediction model by risk level, ranging from 0 to 1. The model was validated by separating patients according to admission date, using the period 1 February to 31 December 2020 (first and second waves, pre-vaccination period) for training, and 1 January to 30 November 2021 (vaccination period) for the test group. An ensemble of ten models with different random seeds was constructed, separating 80% of the patients for training and 20% from the end of the training period for cross-validation. The area under the receiver operating characteristics curve (AUC) was used as a performance metric. Clinical and laboratory data from 23,983 patients were analyzed. CatBoost mortality prediction models achieved an AUC performance of 84.76 (standard deviation 0.45) for patients in the test group (potentially vaccinated patients not included in model training) using 16 features. The performance of the 16-parameter GBDT model for predicting COVID-19 hospital mortality, although requiring a relatively large number of predictors, shows a high predictive capacity.

摘要

COVID-19 导致高死亡率,但缺乏强大的基于机器学习的死亡率预测指标。为了使用梯度提升决策树 (GBDT) 生成预测 COVID-19 住院患者死亡率的模型。西班牙 SEMI-COVID-19 登记处包含了 24514 名 COVID-19 住院患者的伪匿名病例,这些患者的住院时间为 2020 年 2 月 1 日至 2021 年 12 月 5 日。该登记处被用作 GBDT 机器学习模型,采用 CatBoost 和 BorutaShap 分类器来选择最相关的指标,并根据风险级别生成死亡率预测模型,范围从 0 到 1。该模型通过根据入院日期对患者进行分组来进行验证,使用 2020 年 2 月 1 日至 12 月 31 日(第一波和第二波,疫苗接种前)作为训练期,2021 年 1 月 1 日至 11 月 30 日作为测试组。构建了十个具有不同随机种子的模型集合,将 80%的患者用于训练,将训练期末的 20%患者用于交叉验证。接收器操作特征曲线下的面积 (AUC) 被用作性能指标。对 23983 名患者的临床和实验室数据进行了分析。CatBoost 死亡率预测模型在使用 16 个特征对测试组(未包括在模型训练中的潜在接种患者)的患者中实现了 84.76 的 AUC 性能(标准偏差为 0.45)。虽然需要相对较多的预测指标,但 16 个参数的 GBDT 模型预测 COVID-19 住院死亡率的性能显示出较高的预测能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验