Suppr超能文献

多尺度数据提高了用于长新冠预测的机器学习模型的性能。

Multi-scale Data Improves Performance of Machine Learning Model for Long COVID Prediction.

作者信息

Wei Wei-Qi, Guardo Christopher, Zhang Xinmeng, Gandireddy Srushti, Yan Chao, Kerchberger Vern, Dickson Alyson, Pfaff Emily, Master Hiral, Basford Melissa, Chute Christopher, Tran Nguyen, Manusco Salvatore, Syed Toufeeq, Zhao Zhongming, Feng QiPing, Haendel Melissa, Lunt Christopher, Harris Paul, Li Lang, Ginsburg Geoffrey, Denny Joshua, Roden Dan

机构信息

Vanderbilt University Medical Center.

University of North Carolina, USA.

出版信息

Res Sq. 2025 Aug 31:rs.3.rs-7234976. doi: 10.21203/rs.3.rs-7234976/v1.

Abstract

Long COVID affects a substantial proportion of the over 778 million individuals infected with SARS-CoV-2, yet predictive models remain limited in scope. While existing efforts, such as the National COVID Cohort Collaborative (N3C), have leveraged electronic health record (EHR) data for risk prediction, accumulating evidence points to additional contributions from social, behavioral, and genetic factors. Using a diverse cohort of SARS-CoV-2-infected individuals (n>17,200) from the NIH All of Us Research Program, we investigated whether integrating EHR data with survey-based and genomic information improves model performance. Our multi-scale approach outperformed EHR-only models original AUROC 0.736 (95% CI: 0.730, 0.741), achieving an AUROC of 0.748 (0.741,0.755). Among the top predictors, active-duty service status, self-reported fatigue, and chr19:4719431:G:A_A were among the most informative survey and genetic features. These findings highlight the importance of incorporating multi-scale data to improve risk stratification and inform personalized interventions for long COVID.

摘要

长新冠影响着超过7.78亿感染新冠病毒的人中的很大一部分,但预测模型的范围仍然有限。虽然现有的努力,如国家新冠队列协作组织(N3C),已经利用电子健康记录(EHR)数据进行风险预测,但越来越多的证据表明社会、行为和遗传因素也有额外作用。我们使用来自美国国立卫生研究院“我们所有人”研究项目的多样化新冠病毒感染个体队列(n>17200),研究将电子健康记录数据与基于调查的信息和基因组信息相结合是否能提高模型性能。我们的多尺度方法优于仅使用电子健康记录的模型,原始受试者工作特征曲线下面积(AUROC)为0.736(95%置信区间:0.730,0.741),新方法的AUROC达到0.748(0.741,0.755)。在最重要的预测因素中,现役军人身份、自我报告的疲劳以及chr19:4719431:G:A_A是最具信息量的调查和遗传特征。这些发现凸显了纳入多尺度数据以改善风险分层并为长新冠的个性化干预提供信息的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf3d/12408029/024270943063/nihpp-rs7234976v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验