Xavier Alencar, Runcie Daniel, Habier David
Corteva Agrisciences, Seed Product Development, 8305 NW 62nd Ave, Johnston, IA 50131, USA.
Purdue University, Department of Agronomy, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA.
Genetics. 2025 Apr 17;229(4). doi: 10.1093/genetics/iyae179.
Genomic prediction models that capture genotype-by-environment (GxE) interaction are useful for predicting site-specific performance by leveraging information among related individuals and correlated environments, but implementing such models is computationally challenging. This study describes the algorithm of these scalable approaches, including 2 models with latent representations of GxE interactions, namely MegaLMM and MegaSEM, and an efficient multivariate mixed-model solver, namely Pseudo-expectation Gauss-Seidel (PEGS), fitting different covariance structures [unstructured, extended factor analytic (XFA), Heteroskedastic compound symmetry (HCS)]. Accuracy and runtime are benchmarked on simulated scenarios with varying numbers of genotypes and environments. MegaLMM and PEGS-based XFA and HCS models provided the highest accuracy under sparse testing with 100 testing environments. PEGS-based unstructured model was orders of magnitude faster than restricted maximum likelihood (REML) based multivariate genomic best linear unbiased predictions (GBLUP) while providing the same accuracy. MegaSEM provided the lowest runtime, fitting a model with 200 traits and 20,000 individuals in ∼5 min, and a model with 2,000 traits and 2,000 individuals in less than 3 min. With the genomes-to-fields data, the most accurate predictions were attained with the univariate model fitted across environments and by averaging environment-level genomic estimated breeding values (GEBVs) from models with HCS and XFA covariance structures.
能够捕捉基因型与环境(GxE)相互作用的基因组预测模型,对于通过利用相关个体和相关环境之间的信息来预测特定地点的表现很有用,但实现此类模型在计算上具有挑战性。本研究描述了这些可扩展方法的算法,包括2个具有GxE相互作用潜在表示的模型,即MegaLMM和MegaSEM,以及一个高效的多变量混合模型求解器,即伪期望高斯-赛德尔(PEGS),用于拟合不同的协方差结构[无结构、扩展因子分析(XFA)、异方差复合对称(HCS)]。在具有不同基因型和环境数量的模拟场景中对准确性和运行时间进行了基准测试。基于MegaLMM和PEGS的XFA和HCS模型在100个测试环境的稀疏测试下提供了最高的准确性。基于PEGS的无结构模型比基于限制最大似然(REML)的多变量基因组最佳线性无偏预测(GBLUP)快几个数量级,同时提供相同的准确性。MegaSEM提供了最短的运行时间,在约5分钟内拟合了一个具有200个性状和20000个个体的模型,在不到3分钟内拟合了一个具有2000个性状和2000个个体的模型。对于基因组到田间数据,通过跨环境拟合单变量模型并对具有HCS和XFA协方差结构的模型的环境水平基因组估计育种值(GEBV)进行平均,获得了最准确的预测。