Fresenius Medical Care, Global Medical Office, Waltham, Massachusetts.
Division of Nephrology, Maastricht University Medical Center, Maastricht, The Netherlands.
Kidney360. 2021 Jan 13;2(3):456-468. doi: 10.34067/KID.0003802020. eCollection 2021 Mar 25.
We developed a machine learning (ML) model that predicts the risk of a patient on hemodialysis (HD) having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days.
As part of a healthcare operations effort, we used patient data from a national network of dialysis clinics (February-September 2020) to develop an ML model (XGBoost) that uses 81 variables to predict the likelihood of an adult patient on HD having an undetected SARS-CoV-2 infection that is identified in the subsequent ≥3 days. We used a 60%:20%:20% randomized split of COVID-19-positive samples for the training, validation, and testing datasets.
We used a select cohort of 40,490 patients on HD to build the ML model (11,166 patients who were COVID-19 positive and 29,324 patients who were unaffected controls). The prevalence of COVID-19 in the cohort (28% COVID-19 positive) was by design higher than the HD population. The prevalence of COVID-19 was set to 10% in the testing dataset to estimate the prevalence observed in the national HD population. The threshold for classifying observations as positive or negative was set at 0.80 to minimize false positives. Precision for the model was 0.52, the recall was 0.07, and the lift was 5.3 in the testing dataset. Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the model was 0.68 and 0.24 in the testing dataset, respectively. Top predictors of a patient on HD having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD heart rate from the previous month.
The developed ML model appears suitable for predicting patients on HD at risk of having COVID-19 at least 3 days before there would be a clinical suspicion of the disease.
我们开发了一种机器学习 (ML) 模型,用于预测接受血液透析 (HD) 的患者在随后的≥3 天内被发现未检测到 SARS-CoV-2 感染的风险。
作为医疗保健运营工作的一部分,我们使用来自全国性透析诊所网络的患者数据(2020 年 2 月至 9 月)开发了一种 ML 模型(XGBoost),该模型使用 81 个变量来预测接受 HD 的成年患者在随后的≥3 天内被发现未检测到 SARS-CoV-2 感染的可能性。我们使用 COVID-19 阳性样本的 60%:20%:20%随机分割来训练、验证和测试数据集。
我们使用 HD 上的一个精选队列(40490 名患者)来构建 ML 模型(11166 名 COVID-19 阳性患者和 29324 名未受影响的对照患者)。该队列中 COVID-19 的患病率(28% COVID-19 阳性)是为了设计高于 HD 人群而设定的。在测试数据集中,COVID-19 的患病率设定为 10%,以估计全国 HD 人群中观察到的患病率。将观察结果分类为阳性或阴性的阈值设定为 0.80,以最小化假阳性。在测试数据集中,模型的精度为 0.52,召回率为 0.07,提升率为 5.3。模型在测试数据集中的接收器工作特征曲线下面积(AUROC)和精度-召回曲线下面积(AUPRC)分别为 0.68 和 0.24。预测 HD 患者 SARS-CoV-2 感染的最重要预测因素是与前一个月相比的间歇性体重增加变化、前一周的平均预 HD 体温和前一个月的 HD 后心率变化。
该开发的 ML 模型似乎适合预测至少在临床怀疑疾病发生前 3 天患有 COVID-19 风险的 HD 患者。