Pan Pan, Li Yichao, Xiao Yongjiu, Han Bingchao, Su Longxiang, Su Mingliang, Li Yansheng, Zhang Siqi, Jiang Dapeng, Chen Xia, Zhou Fuquan, Ma Ling, Bao Pengtao, Xie Lixin
Chinese PLA General Hospital, Medical School Of Chinese PLA, College of Pulmonary and Critical Care Medicine, Beijing, China.
DHC Mediway Technology Co Ltd, Beijing, China.
J Med Internet Res. 2020 Nov 11;22(11):e23128. doi: 10.2196/23128.
Patients with COVID-19 in the intensive care unit (ICU) have a high mortality rate, and methods to assess patients' prognosis early and administer precise treatment are of great significance.
The aim of this study was to use machine learning to construct a model for the analysis of risk factors and prediction of mortality among ICU patients with COVID-19.
In this study, 123 patients with COVID-19 in the ICU of Vulcan Hill Hospital were retrospectively selected from the database, and the data were randomly divided into a training data set (n=98) and test data set (n=25) with a 4:1 ratio. Significance tests, correlation analysis, and factor analysis were used to screen 100 potential risk factors individually. Conventional logistic regression methods and four machine learning algorithms were used to construct the risk prediction model for the prognosis of patients with COVID-19 in the ICU. The performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC). Interpretation and evaluation of the risk prediction model were performed using calibration curves, SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), etc, to ensure its stability and reliability. The outcome was based on the ICU deaths recorded from the database.
Layer-by-layer screening of 100 potential risk factors finally revealed 8 important risk factors that were included in the risk prediction model: lymphocyte percentage, prothrombin time, lactate dehydrogenase, total bilirubin, eosinophil percentage, creatinine, neutrophil percentage, and albumin level. Finally, an eXtreme Gradient Boosting (XGBoost) model established with the 8 important risk factors showed the best recognition ability in the training set of 5-fold cross validation (AUC=0.86) and the verification queue (AUC=0.92). The calibration curve showed that the risk predicted by the model was in good agreement with the actual risk. In addition, using the SHAP and LIME algorithms, feature interpretation and sample prediction interpretation algorithms of the XGBoost black box model were implemented. Additionally, the model was translated into a web-based risk calculator that is freely available for public usage.
The 8-factor XGBoost model predicts risk of death in ICU patients with COVID-19 well; it initially demonstrates stability and can be used effectively to predict COVID-19 prognosis in ICU patients.
重症监护病房(ICU)中的新型冠状病毒肺炎(COVID-19)患者死亡率较高,早期评估患者预后并实施精准治疗的方法具有重要意义。
本研究旨在利用机器学习构建模型,分析新型冠状病毒肺炎ICU患者的危险因素并预测死亡率。
本研究从数据库中回顾性选取了武汉火神山医院ICU的123例新型冠状病毒肺炎患者,数据按照4∶1的比例随机分为训练数据集(n = 98)和测试数据集(n = 25)。采用显著性检验、相关性分析和因子分析逐一筛选100个潜在危险因素。使用传统逻辑回归方法和4种机器学习算法构建新型冠状病毒肺炎ICU患者预后的风险预测模型。这些机器学习模型的性能通过受试者工作特征曲线(ROC)下面积(AUC)来衡量。使用校准曲线、SHapley值相加解释法(SHAP)、局部可解释模型无关解释法(LIME)等对风险预测模型进行解读和评估,以确保其稳定性和可靠性。结局基于数据库中记录的ICU死亡情况。
对100个潜在危险因素进行逐层筛选,最终确定8个重要危险因素纳入风险预测模型:淋巴细胞百分比、凝血酶原时间、乳酸脱氢酶、总胆红素、嗜酸性粒细胞百分比、肌酐、中性粒细胞百分比和白蛋白水平。最后,基于8个重要危险因素建立的极端梯度提升(XGBoost)模型在5折交叉验证训练集(AUC = 0.86)和验证队列(AUC = 0.92)中表现出最佳识别能力。校准曲线显示模型预测风险与实际风险吻合良好。此外,使用SHAP和LIME算法,实现了XGBoost黑箱模型的特征解释和样本预测解释算法。此外,该模型被转化为基于网络的风险计算器,可供公众免费使用。
8因素XGBoost模型能较好地预测新型冠状病毒肺炎ICU患者的死亡风险;初步证明具有稳定性,可有效用于预测ICU患者新型冠状病毒肺炎的预后。