College of Computer and Data Science, Fuzhou University, Fuzhou, China.
Centre for Big Data Research in Burns and Trauma, Fuzhou University, Fuzhou, China.
Front Cell Infect Microbiol. 2022 Apr 19;12:838749. doi: 10.3389/fcimb.2022.838749. eCollection 2022.
The Coronavirus Disease 2019 (COVID-19) has spread all over the world and impacted many people's lives. The characteristics of COVID-19 and other types of pneumonia have both similarities and differences, which confused doctors initially to separate and understand them. Here we presented a retrospective analysis for both COVID-19 and other types of pneumonia by combining the COVID-19 clinical data, eICU and MIMIC-III databases. Machine learning models, including logistic regression, random forest, XGBoost and deep learning neural networks, were developed to predict the severity of COVID-19 infections as well as the mortality of pneumonia patients in intensive care units (ICU). Statistical analysis and feature interpretation, including the analysis of two-level attention mechanisms on both temporal and non-temporal features, were utilized to understand the associations between different clinical variables and disease outcomes. For the COVID-19 data, the XGBoost model obtained the best performance on the test set (AUROC = 1.000 and AUPRC = 0.833). On the MIMIC-III and eICU pneumonia datasets, our deep learning model (Bi-LSTM_Attn) was able to identify clinical variables associated with death of pneumonia patients (AUROC = 0.924 and AUPRC = 0.802 for 24-hour observation window and 12-hour prediction window). The results highlighted clinical indicators, such as the lymphocyte counts, that may help the doctors to predict the disease progression and outcomes for both COVID-19 and other types of pneumonia.
2019 年冠状病毒病(COVID-19)已在全球范围内传播,对许多人的生活产生了影响。COVID-19 和其他类型肺炎的特征既有相似之处,也有不同之处,这最初使医生难以区分和理解它们。在这里,我们通过结合 COVID-19 临床数据、eICU 和 MIMIC-III 数据库,对 COVID-19 和其他类型肺炎进行了回顾性分析。我们开发了机器学习模型,包括逻辑回归、随机森林、XGBoost 和深度学习神经网络,以预测 COVID-19 感染的严重程度以及重症监护病房(ICU)肺炎患者的死亡率。我们利用统计分析和特征解释,包括对时间和非时间特征的两级注意机制的分析,来了解不同临床变量与疾病结局之间的关系。对于 COVID-19 数据,XGBoost 模型在测试集上的表现最佳(AUROC = 1.000,AUPRC = 0.833)。在 MIMIC-III 和 eICU 肺炎数据集上,我们的深度学习模型(Bi-LSTM_Attn)能够识别与肺炎患者死亡相关的临床变量(24 小时观察窗口和 12 小时预测窗口的 AUROC = 0.924 和 AUPRC = 0.802)。结果突出了临床指标,如淋巴细胞计数,这些指标可能有助于医生预测 COVID-19 和其他类型肺炎的疾病进展和结局。