Ray Susweta, Jarquin Diego, Howard Reka
Dep. of Statistics, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583, USA.
Dep. of Agronomy, Univ. of Florida, Gainesville, FL, 32611, USA.
Plant Genome. 2023 Mar;16(1):e20263. doi: 10.1002/tpg2.20263. Epub 2022 Dec 9.
Soybean [Glycine max (L.) Merr.] is a significant source of protein and oil and is also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein, and oil content is important to feed the ever-growing population. As opposed to high-cost phenotyping, genotyping is both cost and time efficient for breeders because evaluating new lines in different environments (location-year combinations) can be costly. Several genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (genomic best linear unbiased predictor [GBLUP]), a kernel method (Gaussian kernel [GK]), an artificial-intelligence (AI) method (deep learning [DL]), and a hybrid method that corresponds to the emulation of a DL model using a kernel method (an arc-cosine kernel [AK]) in terms of their prediction accuracies for predicting grain yield, oil, and protein using data from the soybean nested association mapping experiment (1,379 genotypes tested in six environments, all genotypes in all environments). The relative performance of the four methods varied with the response variable and whether the model includes the genotype × environmental interaction (G×E) effects or not. The GBLUP consistently showed better performances, whereas GK and AK followed a similar pattern to GBLUP and DL performed slightly worse than the other three methods in most of the cases; however, this may also be attributed to suboptimal hyperparameters. The DL method performed particularly worse than the other three methods in presence of the G×E effects.
大豆[Glycine max (L.) Merr.]是蛋白质和油脂的重要来源,也广泛用作动物饲料。因此,培育在产量、蛋白质和油脂含量方面表现优异的品系对于养活不断增长的人口至关重要。与高成本的表型分析不同,基因分型对育种者来说既节省成本又节省时间,因为在不同环境(地点 - 年份组合)中评估新的品系成本很高。已经开发了几种基因组预测(GP)方法,以有效利用标记和环境数据来预测作物的产量或其他相关表型性状。我们的研究比较了一种传统的GP方法(基因组最佳线性无偏预测器[GBLUP])、一种核方法(高斯核[GK])、一种人工智能(AI)方法(深度学习[DL])以及一种混合方法,该混合方法对应于使用核方法(反余弦核[AK])对DL模型进行模拟,比较它们在使用大豆巢式关联作图实验(在六个环境中测试了1379个基因型,所有环境中的所有基因型)的数据预测籽粒产量、油脂和蛋白质方面的预测准确性。这四种方法的相对性能因响应变量以及模型是否包含基因型×环境互作(G×E)效应而异。GBLUP始终表现出更好的性能,而GK和AK的表现与GBLUP相似,DL在大多数情况下比其他三种方法略差;然而,这也可能归因于超参数未达到最优。在存在G×E效应的情况下,DL方法的表现尤其比其他三种方法差。