Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago Illinois, United States of America.
Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS Genet. 2020 Nov 11;16(11):e1009077. doi: 10.1371/journal.pgen.1009077. eCollection 2020 Nov.
Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.
从电子健康记录(EHR)中提取的表型在遗传研究中越来越普遍。EHR 包含数百种不同的临床实验室测试结果,提供了超越诊断的丰富健康数据。这些实验室数据很复杂,缺乏普遍的编码方案,因此比诊断数据更具挑战性。在这里,我们描述了第一个基于电子病历的大规模跨健康系统全基因组关联研究(GWAS),用于研究基于实验室的定量衍生表型。我们对范德比尔特大学健康系统的 BioVU 队列和密歇根医学的密歇根基因组倡议(MGI)队列之间匹配的 70 种实验室特征进行了荟萃分析。我们展示了这些特征的已知关联的高度复制,验证了基于电子病历的测量值作为遗传分析的高质量表型。值得注意的是,我们的分析为 46 种不同特征的 699 个先前 GWAS 关联提供了首次复制。我们在 22 种不同特征中发现了 31 个具有全基因组意义的新关联,包括两种基于实验室的特征的首次报道的关联。我们在 BioVU 的另一批独立样本中复制了其中 22 个新的关联。所有关联测试的汇总统计信息均可免费获得,以惠益其他研究人员。最后,我们在 BioVU 和 MGI 中进行了镜像分析,以评估电子病历实验室特征的竞争分析实践。我们发现,使用所有可用实验室测量值的平均值提供了稳健的汇总值,但在某些情况下,其他汇总方法可以提高功效。这项研究为跨健康系统 GWAS 提供了原理证明,并且为未来研究定量 EHR 实验室特征提供了框架。