在大型基因分型人群中，使用经过验证和新兴的算法进行单步 GWAS 的标记效应 p 值。

Marker effect p-values for single-step GWAS with the algorithm for proven and young in large genotyped populations.

机构信息

1Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.

出版信息

Genet Sel Evol. 2024 Aug 22;56(1):59. doi: 10.1186/s12711-024-00925-3.

DOI:10.1186/s12711-024-00925-3

PMID:39174924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11340074/

Abstract

BACKGROUND

Single-nucleotide polymorphism (SNP) effects can be backsolved from ssGBLUP genomic estimated breeding values (GEBV) and used for genome-wide association studies (ssGWAS). However, obtaining p-values for those SNP effects relies on the inversion of dense matrices, which poses computational limitations in large genotyped populations. In this study, we present a method to approximate SNP p-values for ssGWAS with many genotyped animals. This method relies on the combination of a sparse approximation of the inverse of the genomic relationship matrix ( ) built with the algorithm for proven and young ( ) and an approximation of the prediction error variance of SNP effects which does not require the inversion of the left-hand side (LHS) of the mixed model equations. To test the proposed p-value computing method, we used a reduced genotyped population of 50K genotyped animals and compared the approximated SNP p-values with benchmark p-values obtained with the direct inverse of LHS built with an exact genomic relationship matrix ( . Then, we applied the proposed approximation method to obtain SNP p-values for a larger genotyped population composed of 450K genotyped animals.

RESULTS

The same genomic regions on chromosomes 7 and 20 were identified across all p-value computing methods when using 50K genotyped animals. In terms of computational requirements, obtaining p-values with the proposed approximation reduced the wall-clock time by 38 times and the memory requirement by ten times compared to using the exact inversion of the LHS. When the approximation was applied to a population of 450K genotyped animals, two new significant regions on chromosomes 6 and 14 were uncovered, indicating an increase in GWAS detection power when including more genotypes in the analyses. The process of obtaining p-values with the approximation and 450K genotyped individuals took 24.5 wall-clock hours and 87.66GB of memory, which is expected to increase linearly with the addition of noncore genotyped individuals.

CONCLUSIONS

With the proposed method, obtaining p-values for SNP effects in ssGWAS is computationally feasible in large genotyped populations. The computational cost of obtaining p-values in ssGWAS may no longer be a limitation in extensive populations with many genotyped animals.

摘要

背景

单核苷酸多态性（SNP）效应可以从 ssGBLUP 基因组估计育种值（GEBV）中反推出来，并用于全基因组关联研究（ssGWAS）。然而，获得那些 SNP 效应的 p 值依赖于密集矩阵的逆运算，这在大型基因分型群体中存在计算限制。在这项研究中，我们提出了一种在有大量基因分型动物的情况下，对 ssGWAS 的 SNP 效应进行近似 p 值计算的方法。该方法依赖于利用算法为已证实和年轻的（）构建的基因组关系矩阵（）的稀疏近似以及 SNP 效应预测误差方差的近似，而不需要混合模型方程的左（LHS）的逆运算。为了测试所提出的 p 值计算方法，我们使用了一个经过缩减的基因分型群体的 50K 个基因分型动物，并将近似 SNP p 值与使用具有精确基因组关系矩阵（）的 LHS 直接逆运算获得的基准 p 值进行了比较。然后，我们应用所提出的近似方法，对由 450K 个基因分型动物组成的更大基因分型群体进行 SNP p 值计算。

结果

当使用 50K 个基因分型动物时，所有 p 值计算方法都在染色体 7 和 20 上鉴定出相同的基因组区域。在计算要求方面，与使用 LHS 的精确逆运算相比，使用所提出的近似方法获得 p 值时，计算时间减少了 38 倍，内存需求减少了 10 倍。当将近似方法应用于 450K 个基因分型动物的群体时，在染色体 6 和 14 上发现了两个新的显著区域，表明在分析中包含更多的基因型时，GWAS 的检测能力有所提高。使用近似方法和 450K 个基因分型个体获得 p 值的过程需要 24.5 个时钟小时和 87.66GB 的内存，预计随着非核心基因分型个体的增加，计算时间和内存需求将呈线性增加。