Department of Engineering Science, University of Oxford, Oxford, UK.
Digital Health Research and Innovation Unit, Institute for Clinical Research, National Institutes of Health (NIH), Shah Alam, Malaysia.
Sci Rep. 2024 Jul 16;14(1):16387. doi: 10.1038/s41598-024-63212-7.
By September 2022, more than 600 million cases of SARS-CoV-2 infection have been reported globally, resulting in over 6.5 million deaths. COVID-19 mortality risk estimators are often, however, developed with small unrepresentative samples and with methodological limitations. It is highly important to develop predictive tools for pulmonary embolism (PE) in COVID-19 patients as one of the most severe preventable complications of COVID-19. Early recognition can help provide life-saving targeted anti-coagulation therapy right at admission. Using a dataset of more than 800,000 COVID-19 patients from an international cohort, we propose a cost-sensitive gradient-boosted machine learning model that predicts occurrence of PE and death at admission. Logistic regression, Cox proportional hazards models, and Shapley values were used to identify key predictors for PE and death. Our prediction model had a test AUROC of 75.9% and 74.2%, and sensitivities of 67.5% and 72.7% for PE and all-cause mortality respectively on a highly diverse and held-out test set. The PE prediction model was also evaluated on patients in UK and Spain separately with test results of 74.5% AUROC, 63.5% sensitivity and 78.9% AUROC, 95.7% sensitivity. Age, sex, region of admission, comorbidities (chronic cardiac and pulmonary disease, dementia, diabetes, hypertension, cancer, obesity, smoking), and symptoms (any, confusion, chest pain, fatigue, headache, fever, muscle or joint pain, shortness of breath) were the most important clinical predictors at admission. Age, overall presence of symptoms, shortness of breath, and hypertension were found to be key predictors for PE using our extreme gradient boosted model. This analysis based on the, until now, largest global dataset for this set of problems can inform hospital prioritisation policy and guide long term clinical research and decision-making for COVID-19 patients globally. Our machine learning model developed from an international cohort can serve to better regulate hospital risk prioritisation of at-risk patients.
截至 2022 年 9 月,全球已报告超过 6 亿例 SARS-CoV-2 感染病例,导致超过 650 万人死亡。然而,COVID-19 死亡率风险估算器通常是在小的非代表性样本和具有方法学限制的情况下开发的。开发 COVID-19 患者肺栓塞(PE)的预测工具非常重要,因为这是 COVID-19 最严重的可预防并发症之一。早期识别有助于在入院时提供救命的靶向抗凝治疗。使用来自国际队列的超过 800,000 例 COVID-19 患者的数据集,我们提出了一种基于成本敏感的梯度提升机机器学习模型,用于预测入院时 PE 和死亡的发生。使用逻辑回归、Cox 比例风险模型和 Shapley 值来确定 PE 和死亡的关键预测因素。我们的预测模型在高度多样化和保留的测试集上对 PE 和全因死亡率的测试 AUROC 分别为 75.9%和 74.2%,敏感性分别为 67.5%和 72.7%。PE 预测模型还分别在英国和西班牙的患者中进行了评估,测试结果分别为 AUROC 为 74.5%、敏感性为 63.5%和 AUROC 为 78.9%、敏感性为 95.7%。年龄、性别、入院区域、合并症(慢性心脏和肺部疾病、痴呆、糖尿病、高血压、癌症、肥胖、吸烟)和症状(任何、混乱、胸痛、疲劳、头痛、发烧、肌肉或关节疼痛、呼吸急促)是入院时最重要的临床预测因素。使用我们的极端梯度提升模型发现,年龄、总体存在症状、呼吸急促和高血压是 PE 的关键预测因素。基于迄今为止针对这组问题的最大全球数据集进行的这项分析,可以为医院的优先排序政策提供信息,并为全球 COVID-19 患者的长期临床研究和决策提供指导。我们从国际队列中开发的机器学习模型可以更好地规范高危患者的医院风险优先级。