Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico.
G3 (Bethesda). 2023 May 2;13(5). doi: 10.1093/g3journal/jkad045.
While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.
虽然已经开发和研究了几种统计机器学习方法来评估植物育种研究中未观察到表型的基因组预测 (GP) 准确性,但很少有方法将基因组学和表型学(成像)联系起来。深度学习 (DL) 神经网络已被开发用于提高未观察到的表型的 GP 准确性,同时考虑基因型-环境相互作用 (GE) 的复杂性;然而,与传统的 GP 模型不同,DL 尚未被研究用于基因组与表型学联系起来的情况。在这项研究中,我们使用了 2 个小麦数据集 (DS1 和 DS2) 来比较一种新的 DL 方法与传统的 GP 模型。为 DS1 拟合的模型包括 GBLUP、梯度提升机 (GBM)、支持向量回归 (SVR) 和 DL 方法。结果表明,在 1 年内,DL 提供的 GP 准确性优于其他模型获得的结果。然而,对于其他年份的 GP 准确性表明,GBLUP 模型略优于 DL。DS2 仅包含经过 3 年、2 种环境(干旱和灌溉)和 2-4 种性状测试的小麦系的基因组数据。DS2 的结果表明,在预测干旱环境下的灌溉环境时,DL 在所有分析的性状和年份中都比 GBLUP 模型具有更高的准确性。在使用灌溉环境信息预测干旱环境时,DL 模型和 GBLUP 模型具有相似的准确性。本研究中使用的 DL 方法是新颖的,具有很强的泛化程度,因为可以潜在地合并和连接几个模块,以生成多输入数据结构的输出。