Suppr超能文献

预测加拿大多伦多的 COVID-19 死亡率:基于树的和基于回归的机器学习方法的比较。

Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods.

机构信息

Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, 5790 University Avenue, Halifax, B3H 1V7, NS, Canada.

Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 80045 Aurora, Colorado, 80045, USA.

出版信息

BMC Med Res Methodol. 2021 Nov 27;21(1):267. doi: 10.1186/s12874-021-01441-4.

Abstract

BACKGROUND

Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system's burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk.

METHODS

We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier's score, calibration intercept and calibration slope.

RESULTS

We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier's scores.

CONCLUSIONS

XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

摘要

背景

冠状病毒病(COVID-19)在全球范围内对全球健康构成了前所未有的威胁。准确预测感染者的死亡风险对于优先提供医疗护理和减轻医疗系统负担至关重要。本研究旨在评估机器学习方法预测 COVID-19 死亡风险的预测准确性。

方法

我们比较了分类树、随机森林(RF)、极端梯度提升(XGBoost)、逻辑回归、广义加性模型(GAM)和线性判别分析(LDA)在预测 2020 年 3 月 1 日至 12 月 10 日期间在加拿大多伦多报告的 49,216 例 COVID-19 阳性病例的死亡风险中的性能。我们使用重复拆分样本验证和 k 步前瞻性验证。使用训练样本估计预测模型,并使用受试者工作特征曲线下的面积、Brier 得分、校准截距和校准斜率评估方法对测试样本的预测准确性。

结果

我们发现 XGBoost 具有高度的辨别力,AUC 为 0.9669,并且在预测 COVID-19 死亡风险方面优于传统的基于树的方法,例如分类树或 RF 方法。基于回归的方法(逻辑、GAM 和 LASSO)与 XGBoost 的性能相当,AUC 略低,Brier 得分略高。

结论

XGBoost 在预测研究人群中的 COVID-19 死亡风险方面优于传统的基于树的方法,并且略微优于基于回归的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6245/8627629/451a4ad37f8a/12874_2021_1441_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验