Meng Yulan, Li Jiaxin, Shan Xinqiang, Lu Pengyu, Huang Wei
Department of Critical Care Medicine, Tacheng People's Hospital of Ili Kazak Autonomous Prefecture, Tacheng 834300, Xinjiang Uygur Autonomous Region, China.
Department of Disease Prevention and Hospital-Acquired Infection Control, First Hospital of Dalian Medical University, Dalian 116012, Liaoning, China.
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2025 Feb;37(2):170-176. doi: 10.3760/cma.j.cn121430-20240729-00640.
To explore the feasibility of incorporating simple bedside indicators into death predictive model for elderly critically ill patients based on interpretability machine learning algorithms, providing a new scheme for clinical disease assessment.
Elderly critically ill patients aged ≥ 65 years who were hospitalized in the intensive care unit (ICU) of Tacheng People's Hospital of Ili Kazak Autonomous Prefecture from June 2017 to May 2020 were retrospectively selected. Basic parameters including demographic characteristics, basic vital signs and fluid intake and output within 24 hours after admission, as well acute physiology and chronic health evaluation II (APACHE II), Glasgow coma score (GCS) and sequential organ failure assessment (SOFA) were also collected. According to outcomes in hospital, patients were divided into survival group and death group. Four datasets were constructed respectively, namely baseline dataset (B), including age, body temperature, heart rate, pulse oxygen saturation, respiratory rate, mean arterial pressure, urine output volume, infusion volume, and crystal solution volume; B+APACHE II dataset (BA), B+GCS dataset (BG), and B+SOFA dataset (BS). Then three machine learning algorithms, Logistic regression (LR), extreme gradient boosting (XGboost) and gradient boosting decision tree (GBDT) were used to develop the corresponding mortality predictive models within four datasets. The feature importance histogram of each prediction model was drawn by SHapley additive explanation (SHAP) method. The area under curve (AUC), accuracy and F1 score of each model were compared to determine the optimal prediction model and then illuminate the nomogram.
A total of 392 patients were collected, including 341 in the survival group and 51 in the death group. There were statistically significant differences in heart rate, pulse oxygen saturation, mean arterial pressure, infusion volume, crystal solution volume, and etiological distribution between the two groups. The top three causes of death were shock, cerebral hemorrhage, and chronic obstructive pulmonary disease. Among the 12 prognostic models trained by three machine learning algorithms, overall performance of prognostic models based on B dataset was behind, whereas the LR model trained by BA dataset achieved the best performance than others with AUC of 0.767 [95% confidence interval (95%CI) was 0.692-0.836], accuracy of 0.875 (95%CI was 0.837-0.903) and F1 score of 0.190. The top 3 variables in this model were crystal solution volume with first 24 hours, heart rate and mean arterial pressure. The nomogram of the model showed that the total score between 150 and 230 were advisable.
The interpretable machine learning model including simple bedside parameters combined with APACHE II score could effectively identify the risk of death in elderly patients with critically illness.
基于可解释性机器学习算法,探索将简单的床边指标纳入老年危重症患者死亡预测模型的可行性,为临床疾病评估提供新方案。
回顾性选取2017年6月至2020年5月在伊犁哈萨克自治州塔城地区人民医院重症监护病房(ICU)住院的年龄≥65岁的老年危重症患者。收集基本参数,包括人口统计学特征、入院后24小时内的基本生命体征及液体出入量,以及急性生理与慢性健康状况评分系统II(APACHE II)、格拉斯哥昏迷评分(GCS)和序贯器官衰竭评估(SOFA)。根据住院结局,将患者分为存活组和死亡组。分别构建四个数据集,即基线数据集(B),包括年龄、体温、心率、脉搏血氧饱和度、呼吸频率、平均动脉压、尿量、输液量和晶体液量;B+APACHE II数据集(BA)、B+GCS数据集(BG)和B+SOFA数据集(BS)。然后使用三种机器学习算法,即逻辑回归(LR)、极端梯度提升(XGboost)和梯度提升决策树(GBDT),在四个数据集中开发相应的死亡预测模型。采用夏普利值法(SHAP)绘制各预测模型的特征重要性直方图。比较各模型的曲线下面积(AUC)、准确率和F1分数,以确定最优预测模型,进而绘制列线图。
共收集392例患者,其中存活组341例,死亡组51例。两组患者在心率、脉搏血氧饱和度、平均动脉压、输液量、晶体液量及病因分布方面存在统计学差异。死亡的前三大原因是休克、脑出血和慢性阻塞性肺疾病。在三种机器学习算法训练的12个预后模型中,基于B数据集的预后模型总体性能落后,而由BA数据集训练的LR模型性能最佳,其AUC为0.767[95%置信区间(95%CI)为0.692 - 0.836],准确率为0.875(95%CI为0.837 - 0.903),F1分数为0.190。该模型中前3个变量为入院后首个24小时的晶体液量、心率和平均动脉压。该模型的列线图显示总分在150至230之间为宜。
包含简单床边参数与APACHE II评分的可解释性机器学习模型可有效识别老年危重症患者的死亡风险。