Department of Computational Medicine and Bioinformatics, Ann Arbor, MI, USA.
Center for Statistical Genetics, Ann Arbor, MI, USA.
Hum Mol Genet. 2018 May 1;27(R1):R14-R21. doi: 10.1093/hmg/ddy081.
The combination of electronic health records (EHRs) with genetic data has ushered in the next wave of complex disease genetics. Population-based biobanks and other large cohorts provide sufficient sample sizes to identify novel genetic associations across the hundreds to thousands of phenotypes gleaned from EHRs. In this review, we summarize the current state of these EHR-linked biobanks, explore ongoing methods development in the field and highlight recent discoveries of genetic associations. We enumerate the many existing biobanks with EHRs linked to genetic data, many of which are available to researchers via application and contain sample sizes >50 000. We also discuss the computational and statistical considerations for analysis of such large datasets including mixed models, phenotype curation and cloud computing. Finally, we demonstrate how genome-wide association studies and phenome-wide association studies have identified novel genetic findings for complex diseases, specifically cardiometabolic traits. As more researchers employ innovative hypotheses and analysis approaches to study EHR-linked biobanks, we anticipate a richer understanding of the genetic etiology of complex diseases.
电子健康记录 (EHR) 与遗传数据的结合迎来了复杂疾病遗传学的下一波浪潮。基于人群的生物库和其他大型队列提供了足够的样本量,可从 EHR 中获取的数百到数千种表型中识别新的遗传关联。在这篇综述中,我们总结了这些与 EHR 相关的生物库的现状,探讨了该领域正在进行的方法开发,并强调了最近发现的遗传关联。我们列举了许多具有 EHR 链接到遗传数据的现有生物库,其中许多都可以通过申请供研究人员使用,样本量超过 50,000。我们还讨论了分析此类大型数据集的计算和统计注意事项,包括混合模型、表型编纂和云计算。最后,我们展示了全基因组关联研究和表型全基因组关联研究如何为复杂疾病(特别是心脏代谢特征)确定新的遗传发现。随着更多的研究人员采用创新的假设和分析方法来研究与 EHR 相关的生物库,我们预计对复杂疾病的遗传病因有更深入的了解。