Suppr超能文献

在“我们所有人”研究项目中用于识别呼吸道病毒感染的可计算表型。

Computable phenotypes to identify respiratory viral infections in the All of Us research program.

作者信息

Waxse Bennett J, Bustos Carrillo Fausto Andres, Tran Tam C, Mo Huan, Ricotta Emily E, Denny Joshua C

机构信息

National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA.

出版信息

Sci Rep. 2025 May 28;15(1):18680. doi: 10.1038/s41598-025-02183-9.

Abstract

Electronic health records (EHRs) contain rich temporal data about respiratory viral infections, but methods to identify these infections from EHR data vary widely and lack robust validation. We developed computable phenotypes by integrating virus-specific International Classification of Diseases (ICD) billing codes, prescriptions, and laboratory results within 90-day episodes. Analysis of 265,222 participants with EHR data from the All of Us Research Program yielded national cohorts of varied size: large cohorts for SARS-CoV-2 (n = 28,729) and influenza (n = 19,784); medium cohorts for rhinovirus, human coronavirus, and respiratory syncytial virus (n = 1,161-1,620); and smaller cohorts for the other viruses (n = 238-486). Using laboratory results as a reference standard, phenotypes using virus-specific ICD codes and medications had variable sensitivity (8-67%) but high positive predictive value (PPV, 90-97%) for most viruses, while influenza virus and SARS-CoV-2 phenotypes had lower PPV (69-70%) that improved with the inclusion of additional ICD codes. Identified infections exhibited expected seasonal patterns matching CDC data. This integrated approach identified infections more effectively than individual components alone and demonstrated utility for severe infections in hospital settings. This method enables large-scale studies of host genetics, health disparities, and clinical outcomes across episodic diseases, with flexibility to optimize sensitivity or PPV depending on the specific research question.

摘要

电子健康记录(EHRs)包含有关呼吸道病毒感染的丰富时间数据,但从EHR数据中识别这些感染的方法差异很大且缺乏有力的验证。我们通过整合90天病程内特定病毒的国际疾病分类(ICD)计费代码、处方和实验室结果,开发了可计算的表型。对来自“我们所有人”研究计划的265222名有EHR数据的参与者进行分析,得出了不同规模的全国队列:严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的大队列(n = 28729)和流感的大队列(n = 19784);鼻病毒、人冠状病毒和呼吸道合胞病毒的中队列(n = 1161 - 1620);以及其他病毒的较小队列(n = 238 - 486)。以实验室结果作为参考标准,使用特定病毒ICD代码和药物的表型对大多数病毒具有可变的敏感性(8 - 67%),但具有较高的阳性预测值(PPV,90 - 97%),而流感病毒和SARS-CoV-2表型的PPV较低(69 - 70%),加入额外的ICD代码后有所改善。识别出的感染呈现出与美国疾病控制与预防中心(CDC)数据相符的预期季节性模式。这种综合方法比单独的各个组成部分更有效地识别感染,并证明了在医院环境中对严重感染的实用性。该方法能够对宿主遗传学、健康差异和跨发作性疾病的临床结局进行大规模研究,并可根据具体研究问题灵活优化敏感性或PPV。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ed/12120013/a9b70cf1b7ca/41598_2025_2183_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验