Suppr超能文献

基于全球患者数据对新冠肺炎死亡率进行可推广预测。

Generalizable prediction of COVID-19 mortality on worldwide patient data.

作者信息

Edelson Maxim, Kuo Tsung-Ting

机构信息

UCSD Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.

出版信息

JAMIA Open. 2022 May 25;5(2):ooac036. doi: 10.1093/jamiaopen/ooac036. eCollection 2022 Jul.

Abstract

OBJECTIVE

Predicting Coronavirus disease 2019 (COVID-19) mortality for patients is critical for early-stage care and intervention. Existing studies mainly built models on datasets with limited geographical range or size. In this study, we developed COVID-19 mortality prediction models on worldwide, large-scale "sparse" data and on a "dense" subset of the data.

MATERIALS AND METHODS

We evaluated 6 classifiers, including logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), AdaBoost (AB), and Naive Bayes (NB). We also conducted temporal analysis and calibrated our models using Isotonic Regression.

RESULTS

The results showed that AB outperformed the other classifiers for the sparse dataset, while LR provided the highest-performing results for the dense dataset (with area under the receiver operating characteristic curve, or AUC ≈ 0.7 for the sparse dataset and AUC = 0.963 for the dense one). We also identified impactful features such as symptoms, countries, age, and the date of death/discharge. All our models are well-calibrated ( > .1).

DISCUSSION

Our results highlight the tradeoff of using sparse training data to increase generalizability versus training on denser data, which produces higher discrimination results. We found that covariates such as patient information on symptoms, countries (where the case was reported), age, and the date of discharge from the hospital or death were the most important for mortality prediction.

CONCLUSION

This study is a stepping-stone towards improving healthcare quality during the COVID-19 era and potentially other pandemics. Our code is publicly available at: https://doi.org/10.5281/zenodo.6336231.

摘要

目的

预测2019冠状病毒病(COVID-19)患者的死亡率对于早期护理和干预至关重要。现有研究主要基于地理范围或规模有限的数据集构建模型。在本研究中,我们基于全球范围内的大规模“稀疏”数据以及该数据的“密集”子集开发了COVID-19死亡率预测模型。

材料与方法

我们评估了6种分类器,包括逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)、多层感知器(MLP)、AdaBoost(AB)和朴素贝叶斯(NB)。我们还进行了时间分析,并使用保序回归对模型进行校准。

结果

结果表明,对于稀疏数据集,AB的表现优于其他分类器,而LR在密集数据集上提供了最高的性能结果(稀疏数据集的受试者工作特征曲线下面积,即AUC≈0.7,密集数据集的AUC = 0.963)。我们还确定了有影响的特征,如症状、国家、年龄以及死亡/出院日期。我们所有的模型校准良好(>.1)。

讨论

我们的结果突出了使用稀疏训练数据以提高泛化能力与在更密集的数据上进行训练之间的权衡,后者会产生更高的判别结果。我们发现,诸如患者症状信息、国家(病例报告地)、年龄以及出院或死亡日期等协变量对于死亡率预测最为重要。

结论

本研究是在COVID-19时代及可能的其他大流行期间提高医疗质量的一块垫脚石。我们的代码可在以下网址公开获取:https://doi.org/10.5281/zenodo.6336231。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd0f/9154018/3c6be47a4a57/ooac036f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验