John Maura, Korte Arthur, Todesco Marco, Grimm Dominik G
Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany.
Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany.
Bioinform Adv. 2024 Oct 28;4(1):vbae168. doi: 10.1093/bioadv/vbae168. eCollection 2024.
Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure.
We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers ( L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support.
permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.
在全基因组关联研究(GWAS)中,对于偏态表型分布,基于排列的显著性阈值已被证明是经典邦费罗尼显著性阈值的一种稳健替代方法。最近发表的permGWAS方法引入了一种分批方法来高效计算基于排列的GWAS。然而,并行运行多个单变量检验会导致许多重复计算并增加计算资源。更重要的是,仅对表型进行排列的传统排列方法会破坏潜在的群体结构。
我们提出了permGWAS2,这是一种改进方法,在排列过程中不会破坏群体结构,并使用一种巧妙的块矩阵分解来优化计算,从而减少冗余。我们在合成数据上表明,与先前版本和常用的邦费罗尼校正相比,这种改进方法对于偏态表型分布产生的错误发现率更低。此外,我们重新分析了一个数据集,该数据集涵盖了615株野生向日葵(L.)群体中86个性状的表型变异。这导致识别出数十个与假定适应性性状的新关联,并消除了一些缺乏生物学支持的可能假阳性关联。
permGWAS2是开源的,可在GitHub上公开下载:https://github.com/grimmlab/permGWAS。