Department of Animal and Dairy Science, University of Georgia, Athens, GA.
The Roslin Institute, The University of Edinburgh, Edinburgh, UK.
J Anim Sci. 2020 Dec 1;98(12). doi: 10.1093/jas/skaa374.
Single-step genomic best linear unbiased prediction with the Algorithm for Proven and Young (APY) is a popular method for large-scale genomic evaluations. With the APY algorithm, animals are designated as core or noncore, and the computing resources to create the inverse of the genomic relationship matrix (GRM) are reduced by inverting only a portion of that matrix for core animals. However, using different core sets of the same size causes fluctuations in genomic estimated breeding values (GEBVs) up to one additive standard deviation without affecting prediction accuracy. About 2% of the variation in the GRM is noise. In the recursion formula for APY, the error term modeling the noise is different for every set of core animals, creating changes in breeding values. While average changes are small, and correlations between breeding values estimated with different core animals are close to 1.0, based on the normal distribution theory, outliers can be several times bigger than the average. Tests included commercial datasets from beef and dairy cattle and from pigs. Beyond a certain number of core animals, the prediction accuracy did not improve, but fluctuations decreased with more animals. Fluctuations were much smaller than the possible changes based on prediction error variance. GEBVs change over time even for animals with no new data as genomic relationships ties all the genotyped animals, causing reranking of top animals. In contrast, changes in nongenomic models without new data are small. Also, GEBV can change due to details in the model, such as redefinition of contemporary groups or unknown parent groups. In particular, increasing the fraction of blending of the GRM with a pedigree relationship matrix from 5% to 20% caused changes in GEBV up to 0.45 SD, with a correlation of GEBV > 0.99. Fluctuations in genomic predictions are part of genomic evaluation models and are also present without the APY algorithm when genomic evaluations are computed with updated data. The best approach to reduce the impact of fluctuations in genomic evaluations is to make selection decisions not on individual animals with limited individual accuracy but on groups of animals with high average accuracy.
使用算法 for Proven 和 Young (APY) 的一步法基因组最佳线性无偏预测是大规模基因组评估的一种流行方法。使用 APY 算法,动物被指定为核心或非核心,并且通过仅对核心动物的一部分基因组关系矩阵 (GRM) 进行求逆,可以减少创建 GRM 逆矩阵的计算资源。然而,使用相同大小的不同核心集会导致基因组估计育种值 (GEBV) 波动高达一个加性标准差,而不会影响预测准确性。GRM 的约 2%的变化是噪声。在 APY 的递归公式中,用于模拟噪声的误差项对于每一组核心动物都是不同的,从而导致育种值发生变化。虽然平均变化很小,并且用不同核心动物估计的育种值之间的相关性接近 1.0,但根据正态分布理论,异常值可能是平均值的数倍。测试包括来自肉牛和奶牛以及猪的商业数据集。超过一定数量的核心动物后,预测准确性不会提高,但随着动物数量的增加,波动会减小。波动比基于预测误差方差的可能变化小得多。即使对于没有新数据的动物,GEBV 也会随时间变化,因为基因组关系将所有已基因分型的动物联系在一起,导致顶级动物的重新排序。相比之下,在没有新数据的非基因组模型中,变化很小。此外,由于模型中的细节,如当代群体的重新定义或未知的亲本群体,GEBV 也可能发生变化。特别是,将 GRM 与系谱关系矩阵混合的比例从 5%增加到 20%,导致 GEBV 变化高达 0.45 SD,相关系数>0.99。基因组预测中的波动是基因组评估模型的一部分,并且在没有 APY 算法的情况下,当使用更新的数据进行基因组评估时,也会出现波动。减少基因组评估波动影响的最佳方法是不根据个体准确性有限的个体动物做出选择决策,而是根据具有高平均准确性的动物群体做出选择决策。