高效的基于排列的全基因组关联研究，适用于正态和偏态表型分布。

Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions.

机构信息

Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany.

Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany.

出版信息

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii5-ii12. doi: 10.1093/bioinformatics/btac455.

DOI:10.1093/bioinformatics/btac455

PMID:36124808

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9486594/

Abstract

MOTIVATION

Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity.

RESULTS

We propose permGWAS, an efficient LMM reformulation based on 4D tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than 8 days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results.

AVAILABILITY AND IMPLEMENTATION

permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组关联研究（GWAS）是研究复杂基因型和表型关系结构的重要工具。线性混合模型（LMM）常用于检测遗传标记与感兴趣性状之间的关联，同时允许考虑群体结构和隐性相关性。LMM 的假设包括残差的正态分布和遗传标记的独立性和同分布——这些假设在实际数据中经常被违反。基于置换的方法可以帮助克服这些限制，并为发现真正的关联提供更现实的阈值。尽管如此，由于计算复杂度高，在实践中很少实施。

结果

我们提出了 permGWAS，这是一种基于 4D 张量的高效 LMM 重新表述方法，可以提供基于置换的显著性阈值。我们表明，与当前最先进的 LMM 相比，我们的方法在运行时表现更好，并且与常用的 Bonferroni 阈值相比，基于置换的阈值对于偏态表型具有更低的假发现率。此外，使用 permGWAS，我们在单个 GPU 上不到 8 天的时间内对超过 500 个拟南芥表型进行了 100 次置换的重新分析。我们的重新分析表明，应用基于置换的阈值可以改进和细化 GWAS 结果的解释。

可用性和实现

permGWAS 是开源的，并在 GitHub 上公开提供下载：https://github.com/grimmlab/permGWAS。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/997e/9486594/2afb44926970/btac455f1.jpg

相似文献

Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions.高效的基于排列的全基因组关联研究，适用于正态和偏态表型分布。

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii5-ii12. doi: 10.1093/bioinformatics/btac455.

The benefits of permutation-based genome-wide association studies.基于排列的全基因组关联研究的优势。

J Exp Bot. 2024 Sep 11;75(17):5377-5389. doi: 10.1093/jxb/erae280.

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.

Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning.使用排列辅助调优的lasso 优先考虑 GWAS 中的遗传变异。

Bioinformatics. 2020 Jun 1;36(12):3811-3817. doi: 10.1093/bioinformatics/btaa229.

PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies.PBOOST：一种基于 GPU 的全基因组关联研究中并行置换检验工具。

Bioinformatics. 2015 May 1;31(9):1460-2. doi: 10.1093/bioinformatics/btu840. Epub 2014 Dec 21.

The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog.AraGWAS 目录：一个经过策展和标准化的拟南芥 GWAS 目录。

Nucleic Acids Res. 2018 Jan 4;46(D1):D1150-D1156. doi: 10.1093/nar/gkx954.

Conditional random fields for fast, large-scale genome-wide association studies.条件随机场在快速、大规模全基因组关联研究中的应用。

PLoS One. 2011;6(7):e21591. doi: 10.1371/journal.pone.0021591. Epub 2011 Jul 12.

A Lasso multi-marker mixed model for association mapping with population structure correction.带有群体结构校正的关联作图的套索多标记混合模型。

Bioinformatics. 2013 Jan 15;29(2):206-14. doi: 10.1093/bioinformatics/bts669. Epub 2012 Nov 22.

A scalable estimator of SNP heritability for biobank-scale data.用于生物库规模数据的 SNP 遗传力可扩展估计器。

Bioinformatics. 2018 Jul 1;34(13):i187-i194. doi: 10.1093/bioinformatics/bty253.

PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.PERMORY：一种利用 LD 进行置换检验的算法，用于进行强大的全基因组关联测试。

Bioinformatics. 2010 Sep 1;26(17):2093-100. doi: 10.1093/bioinformatics/btq399. Epub 2010 Jul 6.

引用本文的文献

FPGA acceleration of GWAS permutation testing.全基因组关联研究（GWAS）置换检验的现场可编程门阵列（FPGA）加速

Bioinform Adv. 2025 Jun 18;5(1):vbaf145. doi: 10.1093/bioadv/vbaf145. eCollection 2025.

A genome-wide association study using Myanmar diversity panel reveals a significant genomic region associated with heading date in rice.一项利用缅甸多样性面板进行的全基因组关联研究揭示了一个与水稻抽穗期相关的重要基因组区域。

Breed Sci. 2024 Dec;74(5):415-426. doi: 10.1270/jsbbs.23083. Epub 2024 Dec 4.

FlexLMM: a Nextflow linear mixed model framework for GWAS.FlexLMM：一种用于全基因组关联研究的Nextflow线性混合模型框架。

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf021.

Population-aware permutation-based significance thresholds for genome-wide association studies.全基因组关联研究中基于群体感知排列的显著性阈值

Bioinform Adv. 2024 Oct 28;4(1):vbae168. doi: 10.1093/bioadv/vbae168. eCollection 2024.

LoDEI: a robust and sensitive tool to detect transcriptome-wide differential A-to-I editing in RNA-seq data.LoDEI：一种强大而敏感的工具，可用于检测 RNA-seq 数据中全转录组范围的 A-to-I 编辑差异。

Nat Commun. 2024 Oct 23;15(1):9121. doi: 10.1038/s41467-024-53298-y.

PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies.PAGER：一种用于在复杂性状关联研究中对加性偏差进行建模的新型基因型编码策略。

BioData Min. 2024 Oct 11;17(1):41. doi: 10.1186/s13040-024-00393-x.

Genome-Wide Association Analysis and Genetic Parameters for Egg Production Traits in Peking Ducks.北京鸭产蛋性状的全基因组关联分析及遗传参数

Animals (Basel). 2024 Jun 27;14(13):1891. doi: 10.3390/ani14131891.

The benefits of permutation-based genome-wide association studies.基于排列的全基因组关联研究的优势。

J Exp Bot. 2024 Sep 11;75(17):5377-5389. doi: 10.1093/jxb/erae280.

Reviewing the essential roles of remote phenotyping, GWAS and explainable AI in practical marker-assisted selection for drought-tolerant winter wheat breeding.回顾远程表型分析、全基因组关联研究（GWAS）以及可解释人工智能在耐旱冬小麦育种实际标记辅助选择中的重要作用。

Front Plant Sci. 2024 Apr 18;15:1319938. doi: 10.3389/fpls.2024.1319938. eCollection 2024.

Fingerprint Finder: Identifying Genomic Fingerprint Sites in Cotton Cohorts for Genetic Analysis and Breeding Advancement.指纹查找器：鉴定棉花群体中的基因组指纹位点，用于遗传分析和育种进展。

Genes (Basel). 2024 Mar 19;15(3):378. doi: 10.3390/genes15030378.

本文引用的文献

Array programming with NumPy.使用 NumPy 进行数组编程。

Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.

Massive haplotypes underlie ecotypic differentiation in sunflowers.大片段单倍型是向日葵生态型分化的基础。

Nature. 2020 Aug;584(7822):602-607. doi: 10.1038/s41586-020-2467-6. Epub 2020 Jul 8.

Network-guided search for genetic heterogeneity between gene pairs.网络引导的基因对间遗传异质性搜索。

Bioinformatics. 2021 Apr 9;37(1):57-65. doi: 10.1093/bioinformatics/btaa581.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

Imputation of 3 million SNPs in the Arabidopsis regional mapping population.在拟南芥区域作图群体中对 300 万个 SNPs 进行了插补。

Plant J. 2020 May;102(4):872-882. doi: 10.1111/tpj.14659. Epub 2020 Feb 11.

AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana.AraPheno 和 AraGWAS 目录 2020：一个主要的数据库更新，包括拟南芥的 RNA-Seq 和敲除突变数据。

Nucleic Acids Res. 2020 Jan 8;48(D1):D1063-D1068. doi: 10.1093/nar/gkz925.

easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies.easyGWAS：一个用于比较全基因组关联研究结果的基于云的平台。

Plant Cell. 2017 Jan;29(1):5-19. doi: 10.1105/tpc.16.00551. Epub 2016 Dec 16.

AraPheno: a public database for Arabidopsis thaliana phenotypes.AraPheno：一个关于拟南芥表型的公共数据库。

Nucleic Acids Res. 2017 Jan 4;45(D1):D1054-D1059. doi: 10.1093/nar/gkw986. Epub 2016 Oct 24.

1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana.1135个基因组揭示了拟南芥多态性的全球模式。

Cell. 2016 Jul 14;166(2):481-491. doi: 10.1016/j.cell.2016.05.063. Epub 2016 Jun 9.

Phenotypic extremes in rare variant study designs.罕见变异研究设计中的表型极端情况。

Eur J Hum Genet. 2016 Jun;24(6):924-30. doi: 10.1038/ejhg.2015.197. Epub 2015 Sep 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高效的基于排列的全基因组关联研究，适用于正态和偏态表型分布。

Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献