Faculty of Life Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway.
GENO SA, Storhamargata 44, 2317, Hamar, Norway.
Genet Sel Evol. 2024 Mar 1;56(1):17. doi: 10.1186/s12711-024-00881-y.
Since the very beginning of genomic selection, researchers investigated methods that improved upon SNP-BLUP (single nucleotide polymorphism best linear unbiased prediction). SNP-BLUP gives equal weight to all SNPs, whereas it is expected that many SNPs are not near causal variants and thus do not have substantial effects. A recent approach to remedy this is to use genome-wide association study (GWAS) findings and increase the weights of GWAS-top-SNPs in genomic predictions. Here, we employ a genome-wide approach to integrate GWAS results into genomic prediction, called GWABLUP.
GWABLUP consists of the following steps: (1) performing a GWAS in the training data which results in likelihood ratios; (2) smoothing the likelihood ratios over the SNPs; (3) combining the smoothed likelihood ratio with the prior probability of SNPs having non-zero effects, which yields the posterior probability of the SNPs; (4) calculating a weighted genomic relationship matrix using the posterior probabilities as weights; and (5) performing genomic prediction using the weighted genomic relationship matrix. Using high-density genotypes and milk, fat, protein and somatic cell count phenotypes on dairy cows, GWABLUP was compared to GBLUP, GBLUP (topSNPs) with extra weights for GWAS top-SNPs, and BayesGC, i.e. a Bayesian variable selection model. The GWAS resulted in six, five, four, and three genome-wide significant peaks for milk, fat and protein yield and somatic cell count, respectively. GWABLUP genomic predictions were 10, 6, 7 and 1% more reliable than those of GBLUP for milk, fat and protein yield and somatic cell count, respectively. It was also more reliable than GBLUP (topSNPs) for all four traits, and more reliable than BayesGC for three of the traits. Although GWABLUP showed a tendency towards inflation bias for three of the traits, this was not statistically significant. In a multitrait analysis, GWABLUP yielded the highest accuracy for two of the traits. However, for SCC, which was relatively unrelated to the yield traits, including yield trait GWAS-results reduced the reliability compared to a single trait analysis.
GWABLUP uses GWAS results to differentially weigh all the SNPs in a weighted GBLUP genomic prediction analysis. GWABLUP yielded up to 10% and 13% more reliable genomic predictions than GBLUP for single and multitrait analyses, respectively. Extension of GWABLUP to single-step analyses is straightforward.
自基因组选择诞生以来,研究人员一直在探索改进单核苷酸多态性最佳线性无偏预测(SNP-BLUP)的方法。SNP-BLUP 对所有 SNP 赋予相同的权重,但预期许多 SNP 并不接近因果变异,因此没有实质性影响。最近的一种补救方法是利用全基因组关联研究(GWAS)的结果,并增加基因组预测中 GWAS 顶级 SNP 的权重。在这里,我们采用一种全基因组方法将 GWAS 结果整合到基因组预测中,称为 GWABLUP。
GWABLUP 包括以下步骤:(1)在训练数据中进行 GWAS,得到似然比;(2)对 SNP 进行平滑处理;(3)将平滑后的似然比与 SNP 具有非零效应的先验概率相结合,得到 SNP 的后验概率;(4)使用后验概率作为权重计算加权基因组关系矩阵;(5)使用加权基因组关系矩阵进行基因组预测。使用奶牛的高密度基因型和牛奶、脂肪、蛋白质和体细胞计数表型,将 GWABLUP 与 GBLUP、为 GWAS 顶级 SNP 增加额外权重的 GBLUP(topSNPs)和贝叶斯 GC(BayesGC)进行比较,即一种贝叶斯变量选择模型。GWAS 针对牛奶、脂肪和蛋白质产量以及体细胞计数分别产生了六个、五个、四个和三个全基因组显著峰。与 GBLUP 相比,GWABLUP 对牛奶、脂肪和蛋白质产量以及体细胞计数的基因组预测分别可靠 10%、6%、7%和 1%。与 GBLUP(topSNPs)相比,GWABLUP 对所有四个性状都更可靠,与 BayesGC 相比,GWABLUP 对三个性状更可靠。尽管 GWABLUP 对三个性状表现出膨胀偏差的趋势,但这在统计学上并不显著。在多性状分析中,GWABLUP 对其中两个性状的准确性最高。然而,对于与产量性状相对无关的 SCC 而言,包括产量性状 GWAS 结果在内的分析与单一性状分析相比降低了可靠性。
GWABLUP 使用 GWAS 结果对加权 GBLUP 基因组预测分析中的所有 SNP 进行差异加权。与 GBLUP 相比,GWABLUP 在单性状和多性状分析中的基因组预测分别可靠 10%和 13%。GWABLUP 扩展到单步分析非常简单。