TERRA Teaching and Research Center, University of Liège, Gembloux Agro-Bio Tech (ULiège-GxABT), 5030 Gembloux, Belgium.
TERRA Teaching and Research Center, University of Liège, Gembloux Agro-Bio Tech (ULiège-GxABT), 5030 Gembloux, Belgium; Department of Animal Science, Shiraz University, 71441-13131 Shiraz, Iran.
J Dairy Sci. 2024 Nov;107(11):9615-9627. doi: 10.3168/jds.2023-24319. Epub 2024 Jul 4.
With the rapid development of animal phenomics and deep phenotyping, we can obtain thousands of traditional (but also molecular) phenotypes per individual. However, there is still a lack of exploration regarding how to handle this huge amount of data in the context of animal breeding, presenting a challenge that we are likely to encounter more and more in the future. This study aimed to (1) explore the use of the mega-scale linear mixed model (MegaLMM), a factor model-based approach that is able to simultaneously estimate (co)variance components and genetic parameters in the context of thousands of milk traits, hereafter called thousand-trait (TT) models; (2) compare the phenotype values and genomic breeding value (u) predictions for focal traits (i.e., traits that are targeted for prediction, compared with secondary traits that are helping to evaluate), from single-trait (ST) and TT models, respectively; (3) propose a new approximate method of GEBV (U) prediction with TT models and MegaLMM. We used a total of 3,421 milk mid-infrared (MIR) spectra wavepoints (called secondary traits) and 3 focal traits (average fat percentage [AFP], average methane production [ACH4], and average SCS [ASCS]) collected on 3,302 first-parity Holstein cows. The 3,421 milk MIR wavepoint traits were composed of 311 wavepoints in 11 classes (months in lactation). Genotyping information of 564,439 SNPs was available for all animals and was used to calculate the genomic relationship matrix. The MegaLMM was implemented in the framework of the Bayesian sparse factor model and solved through Gibbs sampling (Markov chain Monte Carlo). The heritabilities of the studied 3,421 milk MIR wavepoints gradually increased and then decreased in units of 311 wavepoints throughout the lactation. The genetic and phenotypic correlations between the first 311 wavepoints and the other 3,110 wavepoints were low. The accuracies of phenotype predictions from the ST model were lower than those from the TT model for AFP (0.51 vs. 0.93), ACH4 (0.30 vs. 0.86), and ASCS (0.14 vs. 0.33). The same trend was observed for the accuracies of u predictions for AFP (0.59 vs. 0.86), ACH4 (0.47 vs. 0.78), and ASCS (0.39 vs. 0.59). The average correlation between U predicted from the TT model and the new approximate method was 0.90. The new approximate method used for estimating U in MegaLMM will enhance the suitability of MegaLMM for applications in animal breeding. This study conducted an initial investigation into the application of thousands of traits in animal breeding and showed that the TT model is beneficial for the prediction of focal traits (phenotype and breeding values), especially for difficult-to-measure traits (e.g., ACH4).
随着动物表型组学和深度表型分析的快速发展,我们可以为每个个体获得数千个传统(但也是分子)表型。然而,在动物育种的背景下,如何处理这些大量数据仍然缺乏探索,这是我们未来可能会越来越多地遇到的挑战。本研究旨在:(1)探索使用 mega-scale linear mixed model(MegaLMM),这是一种基于因子模型的方法,能够同时估计数千个牛奶性状(以下简称千性状(TT)模型)中的(协)方差分量和遗传参数;(2)比较单性状(ST)和 TT 模型中焦点性状(即目标预测的性状)和次要性状(即帮助评估的性状)的表型值和基因组育种值(u)预测;(3)提出一种使用 TT 模型和 MegaLMM 进行 GEBV(u)预测的新近似方法。我们总共使用了 3421 个牛奶中红外(MIR)光谱波点(称为次要性状)和 3 个焦点性状(平均脂肪百分比[AFP]、平均甲烷产量[ACH4]和平均 SCS[ASCS]),这些数据是在 3302 头初产荷斯坦奶牛上收集的。3421 个牛奶 MIR 波点性状由 11 类(泌乳期的月份)中的 311 个波点组成。所有动物都有可用的 564439 个 SNP 的基因分型信息,并用于计算基因组关系矩阵。MegaLMM 是在贝叶斯稀疏因子模型的框架内实现的,并通过 Gibbs 抽样(马尔可夫链蒙特卡罗)来解决。研究的 3421 个牛奶 MIR 波点的遗传力逐渐增加,然后在整个泌乳期以 311 个波点为单位下降。前 311 个波点与其他 3110 个波点之间的遗传和表型相关性较低。AFP(0.51 对 0.93)、ACH4(0.30 对 0.86)和 ASCS(0.14 对 0.33)的 ST 模型表型预测精度低于 TT 模型。AFP(0.59 对 0.86)、ACH4(0.47 对 0.78)和 ASCS(0.39 对 0.59)的 u 预测精度也有相同的趋势。从 TT 模型预测的 U 与新近似方法之间的平均相关性为 0.90。用于在 MegaLMM 中估计 U 的新近似方法将增强 MegaLMM 在动物育种中的适用性。本研究对在动物育种中应用数千个性状进行了初步研究,结果表明 TT 模型有利于焦点性状(表型和育种值)的预测,特别是对难以测量的性状(如 ACH4)。