Suppr超能文献

长新冠预测模型。

Predictive models of long COVID.

机构信息

Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, 24061, USA.

The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.

出版信息

EBioMedicine. 2023 Oct;96:104777. doi: 10.1016/j.ebiom.2023.104777. Epub 2023 Sep 4.

Abstract

BACKGROUND

The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future.

METHODS

We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741).

FINDINGS

LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75.

INTERPRETATION

ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology.

FUNDING

NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

摘要

背景

长新冠的病因和症状尚不清楚。预测给定的 COVID-19 患者将来是否会发展成长新冠具有挑战性。

方法

我们使用来自国家 COVID 队列协作的电子健康记录 (EHR) 数据来预测长新冠的发病率。我们训练了两个机器学习 (ML) 模型 - 逻辑回归 (LR) 和随机森林 (RF)。用于训练预测器的特征包括急性感染期间的症状和开的药物、COVID-19 治疗措施、预 COVID 合并症和人口统计信息。我们将 U09.9 ICD10-CM 代码诊断的患者分配给“长新冠”标签。队列包括 (a) 使用 U09.9 ICD10-CM 代码报告来自数据合作伙伴的 EHRs 的患者,以及 (b) 每个特征类别中至少有一个 EHR 的患者。我们分析了三个队列:所有患者 (n=2,190,579; 诊断为长新冠 = 17,036)、住院患者 (149,319; 3,295) 和门诊患者 (2,041,260; 13,741)。

结果

LR 和 RF 模型的中位数 AUROC 分别为 0.76 和 0.75。消融研究表明,药物对预测任务的影响最大。SHAP 方法确定了年龄、性别、咳嗽、疲劳、沙丁胺醇、肥胖、糖尿病和慢性肺病作为解释性特征。在一个 N3C 合作伙伴的数据上训练的模型并在另一个合作伙伴的数据上进行测试的模型的平均 AUROC 为 0.75。

解释

使用急性感染期的 EHR 信息进行基于机器学习的分类在预测长新冠方面是有效的。SHAP 方法确定了预测的重要特征。跨站点分析证明了所提出方法的通用性。

资助

NCATS U24 TR002306、NCATS UL1 TR003015、Axle Informatics 分包合同:NCATS-P00438-B、NIH/NIDDK/OD、PSR2015-1720GVALE_01、G43C22001320007 和能源部基础能源科学办公室主任,合同号 DE-AC02-05CH11231。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a70d/10494314/7fb2bd433fad/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验