UW Medicine Research IT, University of Washington, Seattle, WA, United States of America.
Microsoft Research Cambridge, Cambridge, United Kingdom.
PLoS One. 2021 Oct 14;16(10):e0258339. doi: 10.1371/journal.pone.0258339. eCollection 2021.
Despite increased testing efforts and the deployment of vaccines, COVID-19 cases and death toll continue to rise at record rates. Health systems routinely collect clinical and non-clinical information in electronic health records (EHR), yet little is known about how the minimal or intermediate spectra of EHR data can be leveraged to characterize patient SARS-CoV-2 pretest probability in support of interventional strategies.
We modeled patient pretest probability for SARS-CoV-2 test positivity and determined which features were contributing to the prediction and relative to patients triaged in inpatient, outpatient, and telehealth/drive-up visit-types. Data from the University of Washington (UW) Medicine Health System, which excluded UW Medicine care providers, included patients predominately residing in the Seattle Puget Sound area, were used to develop a gradient-boosting decision tree (GBDT) model. Patients were included if they had at least one visit prior to initial SARS-CoV-2 RT-PCR testing between January 01, 2020 through August 7, 2020. Model performance assessments used area-under-the-receiver-operating-characteristic (AUROC) and area-under-the-precision-recall (AUPR) curves. Feature performance assessments used SHapley Additive exPlanations (SHAP) values. The generalized pretest probability model using all available features achieved high overall discriminative performance (AUROC, 0.82). Performance among inpatients (AUROC, 0.86) was higher than telehealth/drive-up testing (AUROC, 0.81) or outpatient testing (AUROC, 0.76). The two-week test positivity rate in patient ZIP code was the most informative feature towards test positivity across visit-types. Geographic and sociodemographic factors were more important predictors of SARS-CoV-2 positivity than individual clinical characteristics.
Recent geographic and sociodemographic factors, routinely collected in EHR though not routinely considered in clinical care, are the strongest predictors of initial SARS-CoV-2 test result. These findings were consistent across visit types, informing our understanding of individual SARS-CoV-2 risk factors with implications for deployment of testing, outreach, and population-level prevention efforts.
尽管加大了检测力度并部署了疫苗,但 COVID-19 病例和死亡人数仍以创纪录的速度上升。医疗系统通常在电子健康记录 (EHR) 中收集临床和非临床信息,但对于如何利用 EHR 数据的最小或中等范围来描述患者 SARS-CoV-2 检测前的可能性,以支持干预策略,人们知之甚少。
我们对 SARS-CoV-2 检测阳性的患者进行了检测前概率建模,并确定了哪些特征对预测有贡献,以及相对于在住院、门诊和远程医疗/驾车就诊患者的相对贡献。该研究的数据来自华盛顿大学 (UW) 医学健康系统,其中排除了 UW 医学护理提供者,患者主要居住在西雅图普吉特海湾地区,用于开发梯度提升决策树 (GBDT) 模型。患者包括在 2020 年 1 月 1 日至 2020 年 8 月 7 日期间至少有一次 SARS-CoV-2 RT-PCR 检测前就诊的患者。模型性能评估使用受试者工作特征曲线下面积 (AUROC) 和精度召回曲线下面积 (AUPR)。特征性能评估使用 Shapley 加性解释 (SHAP) 值。使用所有可用特征的广义检测前概率模型实现了较高的总体判别性能 (AUROC,0.82)。住院患者的表现 (AUROC,0.86) 高于远程医疗/驾车就诊 (AUROC,0.81) 或门诊就诊 (AUROC,0.76)。患者邮政编码的两周检测阳性率是所有就诊类型中最能预测检测阳性的特征。地理和社会人口因素比个体临床特征更能预测 SARS-CoV-2 阳性。
最近的地理和社会人口因素,虽然在临床护理中通常不被考虑,但在 EHR 中常规收集,是最初 SARS-CoV-2 检测结果的最强预测因素。这些发现与就诊类型一致,有助于我们了解个体 SARS-CoV-2 风险因素,对检测、外展和人群水平预防工作的部署具有重要意义。