Suppr超能文献

利用机器学习模型预测新冠疫情后美国的急慢性肾脏疾病:运用国家电子健康记录

Prediction of acute and chronic kidney diseases during the post-covid-19 pandemic with machine learning models: utilizing national electronic health records in the US.

作者信息

Zhang Yue, Ghahramani Nasrollah, Li Runjia, Chinchilli Vernon M, Ba Djibril M

机构信息

Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.

Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA; Department of Medicine, Penn State College of Medicine, Hershey, PA, USA.

出版信息

EBioMedicine. 2025 May;115:105726. doi: 10.1016/j.ebiom.2025.105726. Epub 2025 Apr 26.

Abstract

BACKGROUND

COVID-19 has been linked to acute kidney injury (AKI) and chronic kidney disease (CKD), but machine learning (ML) models predicting these risks post-pandemic have been absent. We aimed to use large electronic health records (EHR) and ML algorithms to predict the incidence of AKI and CKD during the post-pandemic period, assess the necessity of including COVID-19 infection history as a predictor, and develop a practical webpage application for clinical use.

METHODS

National EHR data from TriNetX, emulating a prospective cohort of 104,565 patients from 07/01/2022 to 03/31/2024, were used. A total of 69 baseline variables were included, with demographics, comorbidities, lab test results, vital signs, medication histories, hospitalization visits, and COVID-19-related variables. Prediction windows of 1 month and 1 year were defined to assess AKI and CKD incidence. Eight machine learning models, primarily including extreme gradient boosting (XGBoost), neural network, and random forest (RF), were applied. Cross-validation and model tuning were conducted during the training process. Model performance was evaluated using six metrics, including the area under the receiver-operating-characteristic curve (AUROC). A combination of model-driven, data-driven, and clinical-driven methods was employed to identify the final models. An application with the final models was built using the R Shiny framework.

FINDINGS

The final models, incorporating 9 variables-primarily including eGFR, inpatient visit number, and number of COVID-19 infections-were selected. XGBoost demonstrated the best performance for predicting the incidence of AKI in 1 month (AUROC = 0.803), AKI in 1 year (AUROC = 0.799), and CKD in 1 year (AUROC = 0.894). Random Forest (RF) was selected for predicting the incidence of CKD in 1 month (AUROC = 0.896). A comparison of AUROC with and without COVID-19 infection confirmed its importance as a critical predictor in the model. The final models were translated into a convenient tool to facilitate their use in clinical settings.

INTERPRETATION

Our study demonstrates the applicability of using large national EHR data in developing high-performance machine learning models to predict AKI and CKD risks in the post-COVID-19 period. Incorporating the number of COVID-19 infections in the past year showed improved prediction performance and should be considered in future models for kidney disease prediction. A user-friendly application was created to support clinicians in risk assessment and surveillance.

FUNDING

Artificial Intelligence and Biomedical Informatics Pilot Funding, Penn State College of Medicine.

摘要

背景

新型冠状病毒肺炎(COVID-19)已被证实与急性肾损伤(AKI)和慢性肾脏病(CKD)有关,但目前缺乏预测疫情后这些风险的机器学习(ML)模型。我们旨在利用大型电子健康记录(EHR)和ML算法来预测疫情后时期AKI和CKD的发病率,评估将COVID-19感染史作为预测指标的必要性,并开发一个供临床使用的实用网页应用程序。

方法

使用来自TriNetX的全国EHR数据,模拟了2022年7月1日至2024年3月31日期间104,565名患者的前瞻性队列。总共纳入了69个基线变量,包括人口统计学、合并症、实验室检查结果、生命体征、用药史、住院就诊情况以及与COVID-19相关的变量。定义了1个月和1年的预测窗口来评估AKI和CKD的发病率。应用了八种机器学习模型,主要包括极端梯度提升(XGBoost)、神经网络和随机森林(RF)。在训练过程中进行了交叉验证和模型调整。使用六个指标评估模型性能,包括受试者操作特征曲线下面积(AUROC)。采用模型驱动、数据驱动和临床驱动相结合的方法来确定最终模型。使用R Shiny框架构建了包含最终模型的应用程序。

结果

最终模型纳入了9个变量,主要包括估算肾小球滤过率(eGFR)、住院就诊次数和COVID-19感染次数。XGBoost在预测1个月内的AKI发病率(AUROC = 0.803)、1年内的AKI发病率(AUROC = 0.799)和1年内的CKD发病率(AUROC = 0.894)方面表现最佳。随机森林(RF)被选用于预测1个月内的CKD发病率(AUROC = 0.896)。对有无COVID-19感染的AUROC进行比较,证实了其作为模型中关键预测指标的重要性。最终模型被转化为一个便捷工具,以方便在临床环境中使用。

解读

我们的研究表明,利用大型全国EHR数据开发高性能机器学习模型来预测COVID-19后时期的AKI和CKD风险是可行的。纳入过去一年的COVID-19感染次数显示预测性能有所提高,未来的肾脏疾病预测模型应予以考虑。创建了一个用户友好的应用程序,以支持临床医生进行风险评估和监测。

资金来源

宾夕法尼亚州立大学医学院人工智能与生物医学信息学试点基金。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ebc/12056805/425a69f15f28/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验