National Research Institute of Animal Production, Krakowska 1, 32-083, Balice, Poland.
Biostatistics Group, Department of Genetics, the Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland.
Genet Sel Evol. 2023 Nov 23;55(1):82. doi: 10.1186/s12711-023-00856-5.
The single-step model is becoming increasingly popular for national genetic evaluations of dairy cattle due to the benefits that it offers such as joint breeding value estimation for genotyped and ungenotyped animals. However, the complexity of the model due to a large number of correlated effects can lead to significant computational challenges, especially in terms of accuracy and efficiency of the preconditioned conjugate gradient method used for the estimation. The aim of this study was to investigate the effect of pedigree depth on the model's overall convergence rate as well as on the convergence of different components of the model, in the context of the single-step single nucleotide polymorphism best linear unbiased prediction (SNP-BLUP) model.
The results demonstrate that the dataset with a truncated pedigree converged twice as fast as the full dataset. Still, both datasets showed very high Pearson correlations between predicted breeding values. In addition, by comparing the top 50 bulls between the two datasets we found a high correlation between their rankings. We also analysed the specific convergence patterns underlying different animal groups and model effects, which revealed heterogeneity in convergence behaviour. Effects of SNPs converged the fastest while those of genetic groups converged the slowest, which reflects the difference in information content available in the dataset for those effects. Pre-selection criteria for the SNP set based on minor allele frequency had no impact on either the rate or pattern of their convergence. Among different groups of individuals, genotyped animals with phenotype data converged the fastest, while non-genotyped animals without own records required the largest number of iterations.
We conclude that pedigree structure markedly impacts the convergence rate of the optimisation which is more efficient for the truncated than for the full dataset.
由于单步模型为基因分型和未基因分型动物提供了联合育种值估计等优势,因此该模型在奶牛的全国遗传评估中变得越来越流行。然而,由于相关效应数量众多,模型的复杂性可能会导致计算方面的重大挑战,尤其是在用于估计的预条件共轭梯度方法的准确性和效率方面。本研究的目的是研究在单步单核苷酸多态性最佳线性无偏预测(SNP-BLUP)模型的背景下,系谱深度对模型整体收敛速度以及模型不同组成部分收敛的影响。
结果表明,截断系谱数据集的收敛速度是完整数据集的两倍。尽管如此,两个数据集之间的预测育种值之间仍显示出非常高的皮尔逊相关性。此外,通过比较两个数据集之间的前 50 头公牛,我们发现它们的排名之间存在高度相关性。我们还分析了不同动物群体和模型效应背后的具体收敛模式,这揭示了收敛行为的异质性。SNP 效应的收敛速度最快,而遗传群体的效应收敛速度最慢,这反映了数据集对这些效应的可用信息量的差异。基于次要等位基因频率对 SNP 集进行预筛选标准对其收敛速度或模式均无影响。在不同的个体群体中,具有表型数据的基因分型动物收敛速度最快,而没有自身记录的非基因分型动物则需要最多的迭代次数。
我们得出结论,系谱结构显著影响优化的收敛速度,对于截断数据集比完整数据集更有效。