• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用超级学习者预测全国 COVID 队列协作中的长新冠:队列研究。

Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study.

机构信息

Division of Biostatistics, University of California Berkeley School of Public Health, Berkeley, CA, United States.

Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA, United States.

出版信息

JMIR Public Health Surveill. 2024 Aug 15;10:e53322. doi: 10.2196/53322.

DOI:10.2196/53322
PMID:39146534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11364083/
Abstract

BACKGROUND

Postacute sequelae of COVID-19 (PASC), also known as long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19. These symptoms can occur across a range of biological systems, leading to challenges in determining risk factors for PASC and the causal etiology of this disorder. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited.

OBJECTIVE

Using a sample of 55,257 patients (at a ratio of 1 patient with PASC to 4 matched controls) from the National COVID Cohort Collaborative, as part of the National Institutes of Health Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. The National COVID Cohort Collaborative includes electronic health records for more than 22 million patients from 84 sites across the United States.

METHODS

We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal combination of gradient boosting and random forest algorithms to maximize the area under the receiver operator curve. We evaluated variable importance (Shapley values) based on 3 levels: individual features, temporal windows, and clinical domains. We externally validated these findings using a holdout set of randomly selected study sites.

RESULTS

We were able to predict individual PASC diagnoses accurately (area under the curve 0.874). The individual features of the length of observation period, number of health care interactions during acute COVID-19, and viral lower respiratory infection were the most predictive of subsequent PASC diagnosis. Temporally, we found that baseline characteristics were the most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after acute COVID-19. We found that the clinical domains of health care use, demographics or anthropometry, and respiratory factors were the most predictive of PASC diagnosis.

CONCLUSIONS

The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings. Across individual predictors and clinical domains, we consistently found that factors related to health care use were the strongest predictors of PASC diagnosis. This indicates that any observational studies using PASC diagnosis as a primary outcome must rigorously account for heterogeneous health care use. Our temporal findings support the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients before acute COVID-19 diagnosis, which could improve early interventions and preventive care. Our findings also highlight the importance of respiratory characteristics in PASC risk assessment.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2023.07.27.23293272.

摘要

背景

新冠病毒疾病(COVID-19)的急性后遗症(PASC),也称为长新冠,是一系列在急性 COVID-19 后出现的长期症状的统称。这些症状可能发生在一系列生物系统中,这导致确定 PASC 的危险因素以及该疾病的因果病因具有挑战性。了解预测未来 PASC 的特征是有价值的,因为这可以为确定高危人群和未来的预防措施提供信息。然而,目前关于 PASC 危险因素的知识有限。

目的

利用来自美国 84 个地点的超过 2200 万名患者的国家 COVID 队列协作(National COVID Cohort Collaborative)中的 55257 名患者(急性 COVID-19 患者与 4 名匹配对照患者的比例为 1:4)作为 NIH 长新冠计算挑战的一部分,我们试图从一组精心挑选的临床相关协变量中预测个体发生 PASC 的风险。国家 COVID 队列协作包括来自美国 84 个地点的超过 2200 万名患者的电子健康记录。

方法

我们使用 Super Learner(一种也称为堆叠的集成机器学习算法)来预测给定协变量信息的个体 PASC 状态,以学习最佳的梯度提升和随机森林算法组合,从而最大化接收者操作特征曲线下的面积。我们基于三个级别评估了变量的重要性(Shapley 值):单个特征、时间窗口和临床领域。我们使用随机选择的研究地点的保留集来外部验证这些发现。

结果

我们能够准确地预测个体 PASC 诊断(曲线下面积 0.874)。观察期长度、急性 COVID-19 期间的医疗保健交互次数和病毒下呼吸道感染等个体特征是预测随后发生 PASC 的最具预测性的因素。从时间上看,我们发现与急性 COVID-19 之前、期间或之后的特征相比,基线特征对未来 PASC 诊断的预测性最强。我们发现,医疗保健使用、人口统计学或人体测量学和呼吸因素等临床领域是预测 PASC 诊断的最具预测性的因素。

结论

此处概述的方法提供了使用电子健康记录数据使用 Super Learner 预测 PASC 状态的开源应用示例,该方法可在各种环境中复制。在个体预测因子和临床领域中,我们一致发现与医疗保健使用相关的因素是 PASC 诊断的最强预测因子。这表明,任何使用 PASC 诊断作为主要结局的观察性研究都必须严格考虑异质的医疗保健使用。我们的时间发现支持这样一种假设,即临床医生可能能够在急性 COVID-19 诊断之前准确评估 PASC 的风险,这可以改善早期干预和预防保健。我们的研究结果还强调了呼吸特征在 PASC 风险评估中的重要性。

国际注册报告标识符(IRRID):RR2-10.1101/2023.07.27.23293272。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4585/11364083/d285ae8ee71b/publichealth_v10i1e53322_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4585/11364083/d285ae8ee71b/publichealth_v10i1e53322_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4585/11364083/d285ae8ee71b/publichealth_v10i1e53322_fig1.jpg

相似文献

1
Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study.利用超级学习者预测全国 COVID 队列协作中的长新冠:队列研究。
JMIR Public Health Surveill. 2024 Aug 15;10:e53322. doi: 10.2196/53322.
2
Risk factors associated with post-acute sequelae of SARS-CoV-2: an N3C and NIH RECOVER study.与 SARS-CoV-2 急性后期后遗症相关的风险因素:N3C 和 NIH RECOVER 研究。
BMC Public Health. 2023 Oct 25;23(1):2103. doi: 10.1186/s12889-023-16916-w.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Using Multi-Modal Electronic Health Record Data for the Development and Validation of Risk Prediction Models for Long COVID Using the Super Learner Algorithm.使用多模态电子健康记录数据,借助超级学习算法开发和验证长期新冠风险预测模型
J Clin Med. 2023 Nov 25;12(23):7313. doi: 10.3390/jcm12237313.
5
Using machine learning involving diagnoses and medications as a risk prediction tool for post-acute sequelae of COVID-19 (PASC) in primary care.在初级医疗保健中,将涉及诊断和药物治疗的机器学习作为新冠后急性后遗症(PASC)的风险预测工具。
BMC Med. 2025 Apr 30;23(1):251. doi: 10.1186/s12916-025-04050-w.
6
Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative.基于国家 COVID 队列协作数据的众包机器学习预测长新冠。
EBioMedicine. 2024 Oct;108:105333. doi: 10.1016/j.ebiom.2024.105333. Epub 2024 Sep 24.
7
The prevalence of postacute sequelae of coronavirus disease 2019 in solid organ transplant recipients: Evaluation of risk in the National COVID Cohort Collaborative.COVID-19 后冠状病毒病在实体器官移植受者中的流行情况:国家 COVID 队列协作中的风险评估。
Am J Transplant. 2024 Sep;24(9):1675-1689. doi: 10.1016/j.ajt.2024.06.001. Epub 2024 Jun 8.
8
Risk of post-acute sequelae of SARS-CoV-2 infection associated with pre-coronavirus disease obstructive sleep apnea diagnoses: an electronic health record-based analysis from the RECOVER initiative.基于 RECOVER 计划电子健康记录的分析:与新冠病毒疾病前阻塞性睡眠呼吸暂停诊断相关的 SARS-CoV-2 感染后急性后遗症风险。
Sleep. 2023 Sep 8;46(9). doi: 10.1093/sleep/zsad126.
9
Post-Acute Sequelae of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) After Infection During Pregnancy.感染 SARS-CoV-2 后孕妇的严重急性呼吸综合征冠状病毒 2 (SARS-CoV-2)的急性后期后遗症。
Obstet Gynecol. 2024 Sep 1;144(3):411-420. doi: 10.1097/AOG.0000000000005670. Epub 2024 Jul 11.
10
Ethnic and racial differences in children and young people with respiratory and neurological post-acute sequelae of SARS-CoV-2: an electronic health record-based cohort study from the RECOVER Initiative.感染SARS-CoV-2后出现呼吸和神经后遗症的儿童及青少年的种族差异:一项基于电子健康记录的RECOVER计划队列研究
EClinicalMedicine. 2025 Jan 2;80:103042. doi: 10.1016/j.eclinm.2024.103042. eCollection 2025 Feb.

引用本文的文献

1
Causal Inference via Electronic Health Records in the National Clinical Cohort Collaborative: Challenges and Solutions in Long COVID Research.通过国家临床队列协作中的电子健康记录进行因果推断:长期新冠研究中的挑战与解决方案
medRxiv. 2025 Jun 11:2025.06.06.25329168. doi: 10.1101/2025.06.06.25329168.
2
COVID-19 Vaccination Timing, Relative to Acute COVID-19, and Subsequent Risk of Long COVID.新冠病毒病(COVID-19)疫苗接种时间与急性COVID-19的关系以及后续发生长期COVID的风险
medRxiv. 2025 Apr 23:2025.04.22.25326224. doi: 10.1101/2025.04.22.25326224.

本文引用的文献

1
Development of a Definition of Postacute Sequelae of SARS-CoV-2 Infection.开发 SARS-CoV-2 感染后后遗症的定义。
JAMA. 2023 Jun 13;329(22):1934-1946. doi: 10.1001/jama.2023.8823.
2
Risk Factors Associated With Post-COVID-19 Condition: A Systematic Review and Meta-analysis.与新冠后状况相关的风险因素:系统评价和荟萃分析。
JAMA Intern Med. 2023 Jun 1;183(6):566-580. doi: 10.1001/jamainternmed.2023.0750.
3
Practical considerations for specifying a super learner.指定超级学习者的实用考虑因素。
Int J Epidemiol. 2023 Aug 2;52(4):1276-1285. doi: 10.1093/ije/dyad023.
4
Impact of upper and lower respiratory symptoms on COVID-19 outcomes: a multicenter retrospective cohort study.上、下呼吸道症状对 COVID-19 结局的影响:一项多中心回顾性队列研究。
Respir Res. 2022 Nov 15;23(1):315. doi: 10.1186/s12931-022-02222-3.
5
Identifying who has long COVID in the USA: a machine learning approach using N3C data.在美国识别长新冠患者:使用 N3C 数据的机器学习方法。
Lancet Digit Health. 2022 Jul;4(7):e532-e541. doi: 10.1016/S2589-7500(22)00048-6. Epub 2022 May 16.
6
A clinical review of long-COVID with a focus on the respiratory system.长新冠的临床综述,重点关注呼吸系统。
Curr Opin Pulm Med. 2022 May 1;28(3):174-179. doi: 10.1097/MCP.0000000000000863. Epub 2022 Feb 7.
7
Trends in Disease Severity and Health Care Utilization During the Early Omicron Variant Period Compared with Previous SARS-CoV-2 High Transmission Periods - United States, December 2020-January 2022.疾病严重程度和医疗保健利用趋势在奥密克戎变异株早期与之前 SARS-CoV-2 高传播期相比-美国,2020 年 12 月至 2022 年 1 月。
MMWR Morb Mortal Wkly Rep. 2022 Jan 28;71(4):146-152. doi: 10.15585/mmwr.mm7104e4.
8
Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.基于人工智能的诊断和预后预测模型研究报告指南(TRIPOD-AI)和偏倚风险工具(PROBAST-AI)制定方案。
BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.
9
High-dimensional characterization of post-acute sequelae of COVID-19.高维刻画 COVID-19 后遗留症状。
Nature. 2021 Jun;594(7862):259-264. doi: 10.1038/s41586-021-03553-9. Epub 2021 Apr 22.
10
Efficient nonparametric statistical inference on population feature importance using Shapley values.使用夏普利值对总体特征重要性进行高效非参数统计推断。
Proc Mach Learn Res. 2020 Jul;119:10282-10291.