利用深度学习方法预测血液透析患者的 SARS-CoV-2 感染。

Predicting SARS-CoV-2 infection among hemodialysis patients using deep neural network methods.

机构信息

Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, USA.

Renal Research Institute, New York, USA.

出版信息

Sci Rep. 2024 Oct 9;14(1):23588. doi: 10.1038/s41598-024-74967-4.

DOI:10.1038/s41598-024-74967-4

PMID:39384931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11464512/

Abstract

COVID-19 has a higher rate of morbidity and mortality among dialysis patients than the general population. Identifying infected patients early with the support of predictive models helps dialysis centers implement concerted procedures (e.g., temperature screenings, universal masking, isolation treatments) to control the spread of SARS-CoV-2 and mitigate outbreaks. We collect data from multiple sources, including demographics, clinical, treatment, laboratory, vaccination, socioeconomic status, and COVID-19 surveillance. Previous early prediction models, such as logistic regression, SVM, and XGBoost, require sophisticated feature engineering and need improved prediction performance. We create deep learning models, including Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to predict SARS-CoV-2 infections during incubation. Our study shows deep learning models with minimal feature engineering can identify those infected patients more accurately than previously built models. Our Long Short-Term Memory (LSTM) model consistently performed well, with an AUC exceeding 0.80, peaking at 0.91 in August 2021. The CNN model also demonstrated strong results with an AUC above 0.75. Both models outperformed previous best XGBoost models by over 0.10 in AUC. Prediction accuracy declined as the pandemic evolved, dropping to approximately 0.75 between September 2021 and January 2022. Maintaining a 20% false positive rate, our LSTM and CNN models identified 66% and 64% of positive cases among patients, significantly outperforming XGBoost models at 42%. We also identify key features for dialysis patients by calculating the gradient of the output with respect to the input features. By closely monitoring these factors, dialysis patients can receive earlier diagnoses and care, leading to less severe outcomes. Our research highlights the effectiveness of deep neural networks in analyzing longitudinal data, especially in predicting COVID-19 infections during the crucial incubation period. These deep network approaches surpass traditional methods relying on aggregated variable means, significantly improving the accurate identification of SARS-CoV-2 infections.

摘要

COVID-19 在透析患者中的发病率和死亡率高于一般人群。通过预测模型尽早识别感染患者有助于透析中心实施协调措施（例如，体温筛查、普遍戴口罩、隔离治疗），以控制 SARS-CoV-2 的传播并减轻疫情爆发。我们从多个来源收集数据，包括人口统计学、临床、治疗、实验室、疫苗接种、社会经济状况和 COVID-19 监测。以前的早期预测模型，如逻辑回归、SVM 和 XGBoost，需要复杂的特征工程，并且需要提高预测性能。我们创建了深度学习模型，包括递归神经网络（RNN）和卷积神经网络（CNN），以预测潜伏期内的 SARS-CoV-2 感染。我们的研究表明，具有最小特征工程的深度学习模型可以比以前构建的模型更准确地识别那些感染的患者。我们的长短期记忆（LSTM）模型表现一直很好，AUC 超过 0.80，在 2021 年 8 月达到 0.91 的峰值。CNN 模型也表现出了很强的结果，AUC 高于 0.75。这两个模型的 AUC 都比以前最好的 XGBoost 模型高出 0.10 以上。随着大流行的发展，预测准确性下降，到 2021 年 9 月至 2022 年 1 月期间降至约 0.75。在保持 20%的假阳性率的情况下，我们的 LSTM 和 CNN 模型分别在患者中识别出 66%和 64%的阳性病例，明显优于 XGBoost 模型的 42%。我们还通过计算输出相对于输入特征的梯度来识别透析患者的关键特征。通过密切监测这些因素，透析患者可以更早地得到诊断和护理，从而导致更严重的后果。我们的研究强调了深度学习网络在分析纵向数据方面的有效性，尤其是在预测潜伏期内的 COVID-19 感染方面。这些深度网络方法超过了依赖于聚合变量均值的传统方法，大大提高了 SARS-CoV-2 感染的准确识别。