Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand.
J Anim Sci. 2019 Mar 1;97(3):1090-1100. doi: 10.1093/jas/skz010.
The inverses of the pedigree and genomic relationship matrices (A, G) are required for single-step GBLUP (ssGBLUP). While, inverting A is possible for millions of animals at a linear cost, inverting G has a cubic cost and feasible for at most 150,000 animals, using the current conventional algorithms. The algorithm for proven and young (APY) provides approximations of the regular ssGBLUP by splitting genotyped animals into core and noncore groups, with computational costs being cubic for core and linear for noncore animals. The data consisted of 9,406,096 animals in the pedigree, 6,243,753 weaning weight phenotypes, and 46,949 genotyped animals from 5 breeds, composites, and animals with missing breed information from New Zealand. Aiming to find a core sample for a multibreed sheep population that can provide evaluations similar to those from the regular ssGBLUP, different core types, and core sizes were studied. Core types random, composite, oldest, youngest, the most inbred animals in G (GINB), and in A (AINB) were studied in 5K, 10K, and 20K core sizes (K = 1,000). Romney core was studied in 5K and 10K, and Coopworth-Perendale core was studied in 5K. Correlation and regression coefficient (slope) between GEBV from the non-APY and the APY analyses, as indicators for consistency with non-APY and bias from non-APY, showed a large impact of APY on noncore and a small impact on nongenotyped animals. Breed-based 5K cores resulted in large bias from non-APY even for nongenotyped animals. Random and GINB at 20K core size resulted in the highest consistency with non-APY and the lowest bias from non-APY. However, GINB did not perform as well as Random at lower core sizes. The number of animals from a breed in the core sample was very important for the evaluation of that breed. We observed that cores without Texel or Highlander animals resulted in poor evaluations for those breeds. Solving the mixed model equations, within core type, the smallest core size, and within core size, Random core converged in the least number of iterations. However, APY per se did not necessarily reduce the solving time. Random cores performed the best, as they could give a good coverage on the generations and breeds, representative for the genotyped population. Core size 20K performed better than 5K and 10K, and the optimum core size was found to be 18.8K, according to the eigenvalue decomposition of G.
需要使用系谱和基因组关系矩阵(A、G)的逆矩阵来进行一步法 GBLUP(ssGBLUP)。虽然,对于数百万头动物,可以以线性成本进行 A 的逆运算,但对于 G 的逆运算则具有立方成本,并且使用当前的常规算法,最多只能对 150,000 头动物进行逆运算。APY( Proven and Young)算法通过将已配种动物分为核心和非核心组来提供常规 ssGBLUP 的近似值,对于核心动物,计算成本为立方,对于非核心动物则为线性。数据来自新西兰的 5 个品种、组合和缺少品种信息的动物的 9406096 头系谱、6243753 头断奶体重表型和 46949 头已配种动物。为了找到可以提供与常规 ssGBLUP 相似评估的多品种绵羊群体的核心样本,研究了不同的核心类型和核心大小。研究了核心类型随机、组合、最老、最年轻、G 中最近交(GINB)和 A 中最近交(AINB)在 5K、10K 和 20K 核心大小(K=1000)中的作用。在 5K 和 10K 中研究了 Romney 核心,在 5K 中研究了 Coopworth-Perendale 核心。非 APY 分析和 APY 分析中的 GEBV 之间的相关性和回归系数(斜率)作为与非 APY 的一致性和非 APY 的偏差的指标,表明 APY 对非核心和非配种动物有很大影响。基于品种的 5K 核心即使对于非配种动物也会导致很大的非 APY 偏差。在 20K 核心大小下,随机和 GINB 产生了与非 APY 的最高一致性和最低非 APY 偏差。然而,在较低的核心大小下,GINB 的表现不如随机。核心样本中一个品种的动物数量对于该品种的评估非常重要。我们观察到,没有 Texel 或 Highland 动物的核心样本会导致这些品种的评估不佳。在核心类型、最小核心大小和核心大小内,求解混合模型方程时,Random 核心在迭代次数最少的情况下收敛。然而,APY 本身并不一定能减少求解时间。Random 核心表现最好,因为它们可以很好地覆盖世代和品种,代表已配种的群体。根据 G 的特征值分解,20K 核心大小优于 5K 和 10K,最佳核心大小为 18.8K。