1Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
Genet Sel Evol. 2024 Aug 22;56(1):59. doi: 10.1186/s12711-024-00925-3.
Single-nucleotide polymorphism (SNP) effects can be backsolved from ssGBLUP genomic estimated breeding values (GEBV) and used for genome-wide association studies (ssGWAS). However, obtaining p-values for those SNP effects relies on the inversion of dense matrices, which poses computational limitations in large genotyped populations. In this study, we present a method to approximate SNP p-values for ssGWAS with many genotyped animals. This method relies on the combination of a sparse approximation of the inverse of the genomic relationship matrix ( ) built with the algorithm for proven and young ( ) and an approximation of the prediction error variance of SNP effects which does not require the inversion of the left-hand side (LHS) of the mixed model equations. To test the proposed p-value computing method, we used a reduced genotyped population of 50K genotyped animals and compared the approximated SNP p-values with benchmark p-values obtained with the direct inverse of LHS built with an exact genomic relationship matrix ( . Then, we applied the proposed approximation method to obtain SNP p-values for a larger genotyped population composed of 450K genotyped animals.
The same genomic regions on chromosomes 7 and 20 were identified across all p-value computing methods when using 50K genotyped animals. In terms of computational requirements, obtaining p-values with the proposed approximation reduced the wall-clock time by 38 times and the memory requirement by ten times compared to using the exact inversion of the LHS. When the approximation was applied to a population of 450K genotyped animals, two new significant regions on chromosomes 6 and 14 were uncovered, indicating an increase in GWAS detection power when including more genotypes in the analyses. The process of obtaining p-values with the approximation and 450K genotyped individuals took 24.5 wall-clock hours and 87.66GB of memory, which is expected to increase linearly with the addition of noncore genotyped individuals.
With the proposed method, obtaining p-values for SNP effects in ssGWAS is computationally feasible in large genotyped populations. The computational cost of obtaining p-values in ssGWAS may no longer be a limitation in extensive populations with many genotyped animals.
单核苷酸多态性(SNP)效应可以从 ssGBLUP 基因组估计育种值(GEBV)中反推出来,并用于全基因组关联研究(ssGWAS)。然而,获得那些 SNP 效应的 p 值依赖于密集矩阵的逆运算,这在大型基因分型群体中存在计算限制。在这项研究中,我们提出了一种在有大量基因分型动物的情况下,对 ssGWAS 的 SNP 效应进行近似 p 值计算的方法。该方法依赖于利用算法为已证实和年轻的( )构建的基因组关系矩阵( )的稀疏近似以及 SNP 效应预测误差方差的近似,而不需要混合模型方程的左(LHS)的逆运算。为了测试所提出的 p 值计算方法,我们使用了一个经过缩减的基因分型群体的 50K 个基因分型动物,并将近似 SNP p 值与使用具有精确基因组关系矩阵( )的 LHS 直接逆运算获得的基准 p 值进行了比较。然后,我们应用所提出的近似方法,对由 450K 个基因分型动物组成的更大基因分型群体进行 SNP p 值计算。
当使用 50K 个基因分型动物时,所有 p 值计算方法都在染色体 7 和 20 上鉴定出相同的基因组区域。在计算要求方面,与使用 LHS 的精确逆运算相比,使用所提出的近似方法获得 p 值时,计算时间减少了 38 倍,内存需求减少了 10 倍。当将近似方法应用于 450K 个基因分型动物的群体时,在染色体 6 和 14 上发现了两个新的显著区域,表明在分析中包含更多的基因型时,GWAS 的检测能力有所提高。使用近似方法和 450K 个基因分型个体获得 p 值的过程需要 24.5 个时钟小时和 87.66GB 的内存,预计随着非核心基因分型个体的增加,计算时间和内存需求将呈线性增加。
在所提出的方法中,在大型基因分型群体中,对 ssGWAS 的 SNP 效应进行 p 值计算在计算上是可行的。在具有大量基因分型动物的广泛群体中,获得 ssGWAS 中 p 值的计算成本可能不再是一个限制。