机器学习在预测 2019 年冠状病毒病患者住院死亡率中的应用：一项意大利多中心研究的结果。

BACKGROUND: Several risk factors have been identified to predict worse outcomes in patients affected by SARS-CoV-2 infection. Machine learning algorithms represent a novel approach to identifying a prediction model with a good discriminatory capacity to be easily used in clinical practice. The aim of this study was to obtain a risk score for in-hospital mortality in patients with coronavirus disease infection (COVID-19) based on a limited number of features collected at hospital admission. METHODS AND RESULTS: We studied an Italian cohort of consecutive adult Caucasian patients with laboratory-confirmed COVID-19 who were hospitalized in 13 cardiology units during Spring 2020. The Lasso procedure was used to select the most relevant covariates. The dataset was randomly divided into a training set containing 80% of the data, used for estimating the model, and a test set with the remaining 20%. A Random Forest modeled in-hospital mortality with the selected set of covariates: its accuracy was measured by means of the ROC curve, obtaining AUC, sensitivity, specificity and related 95% confidence interval (CI). This model was then compared with the one obtained by the Gradient Boosting Machine (GBM) and with logistic regression. Finally, to understand if each model has the same performance in the training and test set, the two AUCs were compared using the DeLong's test. Among 701 patients enrolled (mean age 67.2 ± 13.2 years, 69.5% male individuals), 165 (23.5%) died during a median hospitalization of 15 (IQR, 9-24) days. Variables selected by the Lasso procedure were: age, oxygen saturation, PaO2/FiO2, creatinine clearance and elevated troponin. Compared with those who survived, deceased patients were older, had a lower blood oxygenation, lower creatinine clearance levels and higher prevalence of elevated troponin (all P < 0.001). The best performance out of the samples was provided by Random Forest with an AUC of 0.78 (95% CI: 0.68-0.88) and a sensitivity of 0.88 (95% CI: 0.58-1.00). Moreover, Random Forest was the unique model that provided similar performance in sample and out of sample (DeLong test P = 0.78). CONCLUSION: In a large COVID-19 population, we showed that a customizable machine learning-based score derived from clinical variables is feasible and effective for the prediction of in-hospital mortality.

背景：已经确定了一些风险因素来预测 SARS-CoV-2 感染患者的预后不良。机器学习算法是一种识别具有良好判别能力的预测模型的新方法，易于在临床实践中使用。本研究的目的是基于入院时收集的少量特征，为冠状病毒疾病（COVID-19）感染患者获得住院死亡率风险评分。

方法和结果：我们研究了意大利一组连续的成年白种人 COVID-19 患者，他们在 2020 年春季在 13 个心脏病学单位住院。使用 Lasso 程序选择最相关的协变量。数据集随机分为训练集和测试集，各占数据的 80%和 20%。随机森林使用选定的协变量模型预测住院死亡率：通过 ROC 曲线测量其准确性，获得 AUC、敏感性、特异性和相关 95%置信区间（CI）。然后将该模型与梯度提升机（GBM）和逻辑回归进行比较。最后，为了了解每个模型在训练集和测试集中是否具有相同的性能，使用 DeLong 检验比较两个 AUC。在纳入的 701 名患者中（平均年龄 67.2±13.2 岁，69.5%为男性），165 名（23.5%）患者在中位住院时间为 15（IQR，9-24）天期间死亡。通过 Lasso 程序选择的变量包括年龄、氧饱和度、PaO2/FiO2、肌酐清除率和升高的肌钙蛋白。与存活患者相比，死亡患者年龄较大，血氧饱和度较低，肌酐清除率水平较低，升高的肌钙蛋白发生率较高（均 P<0.001）。随机森林在样本中提供了最佳性能，AUC 为 0.78（95%CI：0.68-0.88），敏感性为 0.88（95%CI：0.58-1.00）。此外，随机森林是唯一一种在样本内和样本外提供相似性能的模型（DeLong 检验 P=0.78）。

结论：在大型 COVID-19 人群中，我们表明，基于临床变量的可定制机器学习评分对于预测住院死亡率是可行且有效的。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Machine learning for prediction of in-hospital mortality in coronavirus disease 2019 patients: results from an Italian multicenter study.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

推荐工具