Layer 6 AI (Gutierrez, Volkovs, Poutanen); ICES (Volkovs, Watson, Rosella); Dalla Lana School of Public Health (Watson, Rosella), University of Toronto; Vector Institute (Rosella), Toronto, Ont.
CMAJ Open. 2021 Dec 21;9(4):E1223-E1231. doi: 10.9778/cmajo.20210036. Print 2021 Oct-Dec.
The COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2.
We used routinely collected health records in a patient cohort to develop and validate our prediction model. This cohort included adult patients (age ≥ 18 yr) from Ontario, Canada, who tested positive for SARS-CoV-2 ribonucleic acid by polymerase chain reaction between Feb. 2 and Oct. 5, 2020, and were followed up through Nov. 5, 2020. Patients living in long-term care facilities were excluded, as they were all assumed to be at high risk of hospitalization for COVID-19. Risk of hospitalization within 30 days of diagnosis of SARS-CoV-2 infection was estimated via gradient-boosting decision trees, and variable importance examined via Shapley values. We built a gradient-boosting model using the Extreme Gradient Boosting (XGBoost) algorithm and compared its performance against 4 empirical rules commonly used for risk stratifications based on age and number of comorbidities.
The cohort included 36 323 patients with 2583 hospitalizations (7.1%). Hospitalized patients had a higher median age (64 yr v. 43 yr), were more likely to be male (56.3% v. 47.3%) and had a higher median number of comorbidities (3, interquartile range [IQR] 2-6 v. 1, IQR 0-3) than nonhospitalized patients. Patients were split into development ( = 29 058, 80.0%) and held-out validation ( = 7265, 20.0%) cohorts. The gradient-boosting model achieved high discrimination (development cohort: area under the receiver operating characteristic curve across the 5 folds of 0.852; validation cohort: 0.8475) and strong calibration (slope = 1.01, intercept = -0.01). The patients who scored at the top 10% captured 47.4% of hospitalizations, and those who scored at the top 30% captured 80.6%.
We developed and validated an accurate risk stratification model using routinely collected health administrative data. We envision that modelling such risk stratification based on routinely collected health data could support management of COVID-19 on a population health level.
COVID-19 大流行导致对医疗资源的需求增加,在某些情况下,医疗设备和人员短缺。我们的目标是开发和验证一个多变量模型,以预测感染 SARS-CoV-2 的患者住院的风险。
我们使用来自加拿大安大略省的患者队列中的常规收集的健康记录来开发和验证我们的预测模型。该队列包括在 2020 年 2 月 2 日至 10 月 5 日期间通过聚合酶链反应检测到 SARS-CoV-2 核糖核酸呈阳性的成年患者(年龄≥18 岁),并随访至 2020 年 11 月 5 日。排除居住在长期护理机构的患者,因为他们都假定为 COVID-19 住院的高风险。通过梯度提升决策树估计 SARS-CoV-2 感染后 30 天内住院的风险,并通过 Shapley 值检查变量的重要性。我们使用极端梯度提升(XGBoost)算法构建了一个梯度提升模型,并将其性能与基于年龄和合并症数量的 4 种常用风险分层的经验规则进行了比较。
该队列包括 36323 例患者,其中 2583 例住院(7.1%)。住院患者的中位年龄更高(64 岁比 43 岁),更可能是男性(56.3%比 47.3%),合并症中位数更高(3 个,四分位距 [IQR] 2-6 比 1,IQR 0-3)比非住院患者。患者分为开发(n=29058,80.0%)和保留验证(n=7265,20.0%)队列。梯度提升模型具有较高的区分度(开发队列:5 个折叠的接收器工作特征曲线下面积为 0.852;验证队列:0.8475)和强校准(斜率=1.01,截距=-0.01)。得分最高的 10%的患者捕获了 47.4%的住院患者,得分最高的 30%的患者捕获了 80.6%的住院患者。
我们使用常规收集的健康管理数据开发和验证了一个准确的风险分层模型。我们设想,基于常规收集的健康数据对这种风险分层进行建模可以支持人群健康层面的 COVID-19 管理。