GAUSS：一个基于汇总统计的 R 包，用于准确估计变体的连锁不平衡、高斯插补以及世界性队列的 TWAS 分析。

GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts.

机构信息

Department of Statistics, Miami University, Oxford, OH 45056, United States.

Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, United States.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae203.

DOI:10.1093/bioinformatics/btae203

PMID:38632050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11052653/

Abstract

MOTIVATION

As the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g. fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers.

RESULTS

To address these challenges, we present Genome Analysis Using Summary Statistics (GAUSS)-a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including (i) estimating ancestry proportion of study cohorts, (ii) calculating ancestry-informed linkage disequilibrium, (iii) imputing summary statistics of unobserved variants, (iv) conducting transcriptome-wide association studies, and (v) correcting for "Winner's Curse" biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32 953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information.

AVAILABILITY AND IMPLEMENTATION

The GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text S1.

摘要

动机

随着更大规模、更多族裔参考面板的可用性增加，人们对基于祖先信息的全基因组关联研究（GWAS）和其他下游分析（例如精细映射）的遗传数据进行推断的需求也在增加。在基因型水平上进行此类分析在计算上具有挑战性，并且最好需要繁琐的过程来访问个体水平的基因型和表型数据。基于汇总统计信息的工具不需要个体水平的数据，提供了一种有效的替代方法，通过简化对现有 GWAS 汇总数据的重新分析和下游分析，简化了计算要求并促进了开放科学。然而，现有的工具仅执行所需分析的不同部分，仅具有命令行接口，并且难以通过应用研究人员进行扩展/链接。

结果

为了解决这些挑战，我们提出了使用汇总统计信息进行基因组分析（GAUSS）-这是一个全面且用户友好的 R 包，旨在促进 GWAS 汇总统计信息的重新分析/下游分析。GAUSS 提供了一系列功能的集成工具包，包括（i）估计研究队列的祖先比例，（ii）计算基于祖先的连锁不平衡，（iii）推断未观察到的变体的汇总统计信息，（iv）进行转录组全基因组关联研究，以及（v）纠正“赢家诅咒”偏差。值得注意的是，GAUSS 利用了一个由来自 29 个族裔的 32953 个基因组组成的广泛的多族裔参考面板。该面板增强了可推断变体的范围和准确性，包括推断更罕见变体的汇总统计信息的能力。因此，GAUSS 提高了现有 GWAS 分析的质量和适用性，而无需访问个体水平的基因型和表型信息。

可用性和实现

GAUSS R 包及其源代码可通过我们的 GitHub 存储库 https://github.com/statsleelab/gauss 公开获得。为了进一步帮助用户，我们在 https://statsleelab.github.io/gauss/ 提供了方便的示例用例场景，并在补充文本 S1 中提供了详细的用户指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa3a/11052653/1d0c6848e3ec/btae203f1.jpg

相似文献

GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae203.

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.

Bioinformatics. 2015 Oct 1;31(19):3099-104. doi: 10.1093/bioinformatics/btv348. Epub 2015 Jun 9.

Increasing the resolution and precision of psychiatric genome-wide association studies by re-imputing summary statistics using a large, diverse reference panel.

Am J Med Genet B Neuropsychiatr Genet. 2021 Jan;186(1):16-27. doi: 10.1002/ajmg.b.32834. Epub 2021 Feb 11.

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

Bioinformatics. 2014 Oct 15;30(20):2906-14. doi: 10.1093/bioinformatics/btu416. Epub 2014 Jul 1.

JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts.

Bioinformatics. 2016 Jan 15;32(2):295-7. doi: 10.1093/bioinformatics/btv567. Epub 2015 Oct 1.

Accurate and adaptive imputation of summary statistics in mixed-ethnicity cohorts.

Bioinformatics. 2018 Sep 1;34(17):i687-i696. doi: 10.1093/bioinformatics/bty596.

RAISS: robust and accurate imputation from summary statistics.

Bioinformatics. 2019 Nov 1;35(22):4837-4839. doi: 10.1093/bioinformatics/btz466.

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa442.

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.

Am J Hum Genet. 2017 Oct 5;101(4):539-551. doi: 10.1016/j.ajhg.2017.08.012. Epub 2017 Sep 21.

Estimating colocalization probability from limited summary statistics.

BMC Bioinformatics. 2021 May 17;22(1):254. doi: 10.1186/s12859-021-04170-z.

引用本文的文献

PRED-LD: efficient imputation of GWAS summary statistics.

BMC Bioinformatics. 2025 Apr 16;26(1):107. doi: 10.1186/s12859-025-06119-y.

Characterizing substructure via mixture modeling in large-scale genetic summary statistics.

Am J Hum Genet. 2025 Feb 6;112(2):235-253. doi: 10.1016/j.ajhg.2024.12.007. Epub 2025 Jan 16.

ZMIX: estimating ancestry proportions using GWAS association Z-scores.

Bioinform Adv. 2024 Aug 29;4(1):vbae128. doi: 10.1093/bioadv/vbae128. eCollection 2024.

本文引用的文献

Mapping genomic loci implicates genes and synaptic biology in schizophrenia.

Nature. 2022 Apr;604(7906):502-508. doi: 10.1038/s41586-022-04434-5. Epub 2022 Apr 8.

Summix: A method for detecting and adjusting for population structure in genetic summary data.

Am J Hum Genet. 2021 Jul 1;108(7):1270-1282. doi: 10.1016/j.ajhg.2021.05.016. Epub 2021 Jun 21.

Increasing the resolution and precision of psychiatric genome-wide association studies by re-imputing summary statistics using a large, diverse reference panel.

Am J Med Genet B Neuropsychiatr Genet. 2021 Jan;186(1):16-27. doi: 10.1002/ajmg.b.32834. Epub 2021 Feb 11.

Polygenic risk scores: from research tools to clinical instruments.

Genome Med. 2020 May 18;12(1):44. doi: 10.1186/s13073-020-00742-5.

From genome-wide associations to candidate causal variants by statistical fine-mapping.

Nat Rev Genet. 2018 Aug;19(8):491-504. doi: 10.1038/s41576-018-0016-z.

JEPEGMIX2: improved gene-level joint analysis of eQTLs in cosmopolitan cohorts.

Bioinformatics. 2018 Jan 15;34(2):286-288. doi: 10.1093/bioinformatics/btx509.

Next-generation genotype imputation service and methods.

Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656. Epub 2016 Aug 29.

A simple yet accurate correction for winner's curse can predict signals discovered in much larger genome scans.

Bioinformatics. 2016 Sep 1;32(17):2598-603. doi: 10.1093/bioinformatics/btw303. Epub 2016 May 13.

JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts.

Bioinformatics. 2016 Jan 15;32(2):295-7. doi: 10.1093/bioinformatics/btv567. Epub 2015 Oct 1.

A gene-based association method for mapping traits using reference transcriptome data.

Nat Genet. 2015 Sep;47(9):1091-8. doi: 10.1038/ng.3367. Epub 2015 Aug 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GAUSS：一个基于汇总统计的 R 包，用于准确估计变体的连锁不平衡、高斯插补以及世界性队列的 TWAS 分析。

GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts.

机构信息

Department of Statistics, Miami University, Oxford, OH 45056, United States.

Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, United States.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae203.

DOI:10.1093/bioinformatics/btae203

PMID:38632050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11052653/

Abstract

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

摘要

GAUSS：一个基于汇总统计的 R 包，用于准确估计变体的连锁不平衡、高斯插补以及世界性队列的 TWAS 分析。

GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

GAUSS：一个基于汇总统计的 R 包，用于准确估计变体的连锁不平衡、高斯插补以及世界性队列的 TWAS 分析。

GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献