Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
Department of Computer Science, Purdue University, West Lafayette, IN, USA.
Sci Rep. 2022 May 17;12(1):8242. doi: 10.1038/s41598-022-12185-6.
The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for genetic research and worldwide collaboration. Methods that are based on GWAS summary statistics seek to leverage such records, overcoming barriers that often exist in individual-level data access while also offering significant computational savings. Such summary-statistics-based applications include GWAS meta-analysis, with and without sample overlap, and case-case GWAS. We compare performance of leading methods for summary-statistics-based genomic analysis and also introduce a novel framework that can unify usual summary-statistics-based implementations via the reconstruction of allelic and genotypic frequencies and counts (ReACt). First, we evaluate ASSET, METAL, and ReACt using both synthetic and real data for GWAS meta-analysis (with and without sample overlap) and find that, while all three methods are comparable in terms of power and error control, ReACt and METAL are faster than ASSET by a factor of at least hundred. We then proceed to evaluate performance of ReACt vs an existing method for case-case GWAS and show comparable performance, with ReACt requiring minimal underlying assumptions and being more user-friendly. Finally, ReACt allows us to evaluate, for the first time, an implementation for calculating polygenic risk score (PRS) for groups of cases and controls based on summary statistics. Our work demonstrates the power of GWAS summary-statistics-based methodologies and the proposed novel method provides a unifying framework and allows further extension of possibilities for researchers seeking to understand the genetics of complex disease.
全基因组关联研究 (GWAS) 的出现导致了人类遗传变异大型存储库的创建,为遗传研究和全球合作创造了巨大的机会。基于 GWAS 汇总统计数据的方法旨在利用这些记录,克服个体水平数据访问中经常存在的障碍,同时还提供了显著的计算节省。此类基于汇总统计数据的应用程序包括具有和不具有样本重叠的 GWAS 荟萃分析,以及病例对照 GWAS。我们比较了基于汇总统计数据的基因组分析的领先方法的性能,还引入了一个新的框架,该框架可以通过重建等位基因和基因型频率和计数 (ReACt) 来统一常见的基于汇总统计数据的实现。首先,我们使用 GWAS 荟萃分析(具有和不具有样本重叠)的合成和真实数据评估 ASSET、METAL 和 ReACt,发现虽然这三种方法在功效和误差控制方面都具有可比性,但 ReACt 和 METAL 的速度比 ASSET 快至少 100 倍。然后,我们继续评估 ReACt 与现有的病例对照 GWAS 方法的性能,并显示出可比的性能,ReACt 要求的基本假设最少,并且更用户友好。最后,ReACt 允许我们首次评估基于汇总统计数据为病例和对照组计算多基因风险评分 (PRS) 的实现。我们的工作展示了 GWAS 汇总统计数据方法的强大功能,所提出的新方法提供了一个统一的框架,并允许寻求理解复杂疾病遗传学的研究人员进一步扩展可能性。