Schwab Patrick, DuMont Schütte August, Dietz Benedikt, Bauer Stefan
F Hoffmann-La Roche Ltd, Basel, Switzerland.
Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland.
J Med Internet Res. 2020 Oct 6;22(10):e21439. doi: 10.2196/21439.
COVID-19 is a rapidly emerging respiratory disease caused by SARS-CoV-2. Due to the rapid human-to-human transmission of SARS-CoV-2, many health care systems are at risk of exceeding their health care capacities, in particular in terms of SARS-CoV-2 tests, hospital and intensive care unit (ICU) beds, and mechanical ventilators. Predictive algorithms could potentially ease the strain on health care systems by identifying those who are most likely to receive a positive SARS-CoV-2 test, be hospitalized, or admitted to the ICU.
The aim of this study is to develop, study, and evaluate clinical predictive models that estimate, using machine learning and based on routinely collected clinical data, which patients are likely to receive a positive SARS-CoV-2 test or require hospitalization or intensive care.
Using a systematic approach to model development and optimization, we trained and compared various types of machine learning models, including logistic regression, neural networks, support vector machines, random forests, and gradient boosting. To evaluate the developed models, we performed a retrospective evaluation on demographic, clinical, and blood analysis data from a cohort of 5644 patients. In addition, we determined which clinical features were predictive to what degree for each of the aforementioned clinical tasks using causal explanations.
Our experimental results indicate that our predictive models identified patients that test positive for SARS-CoV-2 a priori at a sensitivity of 75% (95% CI 67%-81%) and a specificity of 49% (95% CI 46%-51%), patients who are SARS-CoV-2 positive that require hospitalization with 0.92 area under the receiver operator characteristic curve (AUC; 95% CI 0.81-0.98), and patients who are SARS-CoV-2 positive that require critical care with 0.98 AUC (95% CI 0.95-1.00).
Our results indicate that predictive models trained on routinely collected clinical data could be used to predict clinical pathways for COVID-19 and, therefore, help inform care and prioritize resources.
新型冠状病毒肺炎(COVID-19)是由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起的一种迅速出现的呼吸道疾病。由于SARS-CoV-2在人与人之间的快速传播,许多医疗保健系统面临超出其医疗保健能力的风险,特别是在SARS-CoV-2检测、医院和重症监护病房(ICU)床位以及机械通气方面。预测算法有可能通过识别那些最有可能SARS-CoV-2检测呈阳性、住院或入住ICU的人来缓解医疗保健系统的压力。
本研究的目的是开发、研究和评估临床预测模型,这些模型使用机器学习并基于常规收集的临床数据来估计哪些患者可能SARS-CoV-2检测呈阳性或需要住院或重症监护。
我们采用系统的方法进行模型开发和优化,训练并比较了各种类型的机器学习模型,包括逻辑回归、神经网络、支持向量机、随机森林和梯度提升。为了评估所开发的模型,我们对来自5644名患者队列的人口统计学、临床和血液分析数据进行了回顾性评估。此外,我们使用因果解释确定了哪些临床特征对于上述每个临床任务具有何种程度的预测性。
我们的实验结果表明,我们的预测模型识别出SARS-CoV-2检测呈阳性的患者,其灵敏度为75%(95%置信区间67%-81%),特异性为49%(95%置信区间46%-51%);识别出SARS-CoV-2阳性且需要住院的患者,其受试者工作特征曲线(AUC)下面积为0.92(95%置信区间0.81-0.98);识别出SARS-CoV-2阳性且需要重症监护的患者,其AUC为0.98(95%置信区间0.95-1.00)。
我们的结果表明,基于常规收集的临床数据训练的预测模型可用于预测COVID-19的临床路径,从而有助于为医疗护理提供信息并对资源进行优先排序。