Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea.
Center for Data Science, New York University, New York, NY, United States.
J Med Internet Res. 2024 Jan 11;26:e52134. doi: 10.2196/52134.
Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability.
The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers.
We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods.
Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910).
RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.
准确预测 COVID-19 患者的严重程度对于患者分诊决策至关重要。许多提出的模型要么存在高偏差风险,要么存在低至中等的区分度。有些模型还缺乏临床可解释性,并且是基于早期大流行时期的数据开发的。因此,迫切需要改进预测模型以提高临床适用性。
本研究的主要目的是开发和验证一种基于机器学习的稳健且可解释的早期分诊支持(RIETS)系统,该系统可根据临床和实验室生物标志物,预测住院后 15 天内的严重程度进展(包括以下任何事件:入住重症监护病房、院内死亡、需要机械通气或需要体外膜氧合)。
我们纳入了 2020 年 1 月至 2022 年 8 月期间来自韩国 19 家医院的 5945 名 COVID-19 住院患者的数据。为了进行模型开发和外部验证,整个数据集通过分层随机聚类抽样,按照医院类型(一般和三级护理)和地理位置(都市区和非都市区)分为 2 个独立的队列。在开发队列中,我们通过交叉验证技术训练和内部验证机器学习模型。在外部验证队列中,我们使用自举抽样技术进行外部验证。我们主要基于接收者操作特征曲线下的面积(AUROC)来选择表现最佳的模型,并且通过偏差风险评估来评估其稳健性。对于模型可解释性,我们使用 Shapley 和患者聚类方法。
我们的最终模型 RIETS 是基于 11 个临床和实验室生物标志物的深度神经网络开发的,这些标志物在住院的第一天即可获得。预测严重程度的特征包括乳酸脱氢酶、年龄、绝对淋巴细胞计数、呼吸困难、呼吸频率、糖尿病、C 反应蛋白、绝对中性粒细胞计数、血小板计数、白细胞计数和外周血氧饱和度。RIETS 表现出优异的区分能力(AUROC=0.937;95%CI 0.935-0.938),校准度高(综合校准指数=0.041),在风险评估工具中满足低偏差风险的所有标准,并提供了模型参数和患者聚类的详细解释。此外,RIETS 还表现出在变异时期的可转移性潜力,对奥密克戎病例具有可持续的预测能力(AUROC=0.903,95%CI 0.897-0.910)。
RIETS 是为了协助早期分诊而开发和验证的,可通过及时预测 COVID-19 住院患者的严重程度。其低偏差风险的高性能确保了相当可靠的预测。该模型在模型开发和验证中使用了全国性的多中心队列,这意味着其具有普遍性。使用常规收集的特征可能使其具有广泛的适应性。模型参数和患者的解释可以促进临床适用性。我们预计,RIETS 将在纳入常规临床实践时促进患者分诊工作流程和有效资源分配。