Jones Steven Christopher, Cardone Katie M, Bradford Yuki, Tishkoff Sarah A, Ritchie Marylyn D
Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
Pac Symp Biocomput. 2025;30:251-267. doi: 10.1142/9789819807024_0019.
Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.
全基因组关联研究(GWAS)是研究复杂疾病遗传学的重要工具。作为GWAS一部分所采用的质量控制(QC)程序的决策,可能会对结果及其生物学解释产生重要影响。许多GWAS主要在欧洲血统人群中进行,但许多倡议旨在增加遗传研究中不同血统的代表性。关于如何合并这些数据以及不同血统群体间的遗传变异可能对GWAS结果产生的影响这一问题,值得进一步研究。在本研究中,我们聚焦于几种常用的合并不同血统群体遗传数据的方法,以及这些决策对GWAS汇总统计结果的影响。我们使用特定血统、多血统的大规模分析和荟萃分析方法,对两种二元表型进行了GWAS。我们发现,虽然多血统大规模分析和荟萃分析方法有助于识别不同血统间共享的信号,但它们可能会削弱特定血统关联的信号,并改变其效应大小。这些结果证明了对GWAS下游分析及后续研究的潜在影响。关于如何合并遗传数据的决策,有可能掩盖那些可能对在遗传研究中历史代表性不足的血统个体有用的重要发现。需要开发结合特定血统变异和共享变异的新方法。