Department of Statistics, Miami University, Oxford, OH 45056, United States.
Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, United States.
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae203.
As the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g. fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers.
To address these challenges, we present Genome Analysis Using Summary Statistics (GAUSS)-a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including (i) estimating ancestry proportion of study cohorts, (ii) calculating ancestry-informed linkage disequilibrium, (iii) imputing summary statistics of unobserved variants, (iv) conducting transcriptome-wide association studies, and (v) correcting for "Winner's Curse" biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32 953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information.
The GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text S1.
随着更大规模、更多族裔参考面板的可用性增加,人们对基于祖先信息的全基因组关联研究(GWAS)和其他下游分析(例如精细映射)的遗传数据进行推断的需求也在增加。在基因型水平上进行此类分析在计算上具有挑战性,并且最好需要繁琐的过程来访问个体水平的基因型和表型数据。基于汇总统计信息的工具不需要个体水平的数据,提供了一种有效的替代方法,通过简化对现有 GWAS 汇总数据的重新分析和下游分析,简化了计算要求并促进了开放科学。然而,现有的工具仅执行所需分析的不同部分,仅具有命令行接口,并且难以通过应用研究人员进行扩展/链接。
为了解决这些挑战,我们提出了使用汇总统计信息进行基因组分析(GAUSS)-这是一个全面且用户友好的 R 包,旨在促进 GWAS 汇总统计信息的重新分析/下游分析。GAUSS 提供了一系列功能的集成工具包,包括(i)估计研究队列的祖先比例,(ii)计算基于祖先的连锁不平衡,(iii)推断未观察到的变体的汇总统计信息,(iv)进行转录组全基因组关联研究,以及(v)纠正“赢家诅咒”偏差。值得注意的是,GAUSS 利用了一个由来自 29 个族裔的 32953 个基因组组成的广泛的多族裔参考面板。该面板增强了可推断变体的范围和准确性,包括推断更罕见变体的汇总统计信息的能力。因此,GAUSS 提高了现有 GWAS 分析的质量和适用性,而无需访问个体水平的基因型和表型信息。
GAUSS R 包及其源代码可通过我们的 GitHub 存储库 https://github.com/statsleelab/gauss 公开获得。为了进一步帮助用户,我们在 https://statsleelab.github.io/gauss/ 提供了方便的示例用例场景,并在补充文本 S1 中提供了详细的用户指南。