Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany.
Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany.
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii5-ii12. doi: 10.1093/bioinformatics/btac455.
Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity.
We propose permGWAS, an efficient LMM reformulation based on 4D tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than 8 days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results.
permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.
Supplementary data are available at Bioinformatics online.
全基因组关联研究(GWAS)是研究复杂基因型和表型关系结构的重要工具。线性混合模型(LMM)常用于检测遗传标记与感兴趣性状之间的关联,同时允许考虑群体结构和隐性相关性。LMM 的假设包括残差的正态分布和遗传标记的独立性和同分布——这些假设在实际数据中经常被违反。基于置换的方法可以帮助克服这些限制,并为发现真正的关联提供更现实的阈值。尽管如此,由于计算复杂度高,在实践中很少实施。
我们提出了 permGWAS,这是一种基于 4D 张量的高效 LMM 重新表述方法,可以提供基于置换的显著性阈值。我们表明,与当前最先进的 LMM 相比,我们的方法在运行时表现更好,并且与常用的 Bonferroni 阈值相比,基于置换的阈值对于偏态表型具有更低的假发现率。此外,使用 permGWAS,我们在单个 GPU 上不到 8 天的时间内对超过 500 个拟南芥表型进行了 100 次置换的重新分析。我们的重新分析表明,应用基于置换的阈值可以改进和细化 GWAS 结果的解释。
permGWAS 是开源的,并在 GitHub 上公开提供下载:https://github.com/grimmlab/permGWAS。
补充数据可在 Bioinformatics 在线获得。