McBreen Jordan, Babar Md Ali, Jarquin Diego, Ampatzidis Yiannis, Khan Naeem, Kunwar Sudip, Acharya Janam Prabhat, Adewale Samuel, Brown-Guedira Gina
Department of Agronomy, University of Florida, Gainesville, Florida, USA.
Agricultural and Biological Engineering Department, Southwest Florida Research and Education Center, University of Florida, IFAS, Immokalee, Florida, USA.
Plant Genome. 2025 Mar;18(1):e20554. doi: 10.1002/tpg2.20554.
Integrating genomic, hyperspectral imaging (HSI), and environmental data enhances wheat yield predictions, with HSI providing detailed spectral insights for predicting complex grain yield (GY) traits. Incorporating HSI data with single nucleotide polymorphic markers (SNPs) resulted in a substantial improvement in predictive ability compared to the conventional genomic prediction models. Over the course of several years, the prediction ability varied due to diverse weather conditions. The most comprehensive parametric model tested, which included SNPs, HSI, and environmental covariates data, consistently achieved the best results, closely followed by machine learning (ML) approaches when considering the same omics data. For example, the most comprehensive model (M9), under the forward prediction cross-validation scheme, predicted the GY of the 2023 growing season using data from 2021 and 2022 for a correlation between predicted and observed values of 0.53. This model demonstrated superior performance compared to less complex models, emphasizing the advantage of integrating numerous data sources and their interactive effects. Furthermore, when comparing the top 25% of the predicted lines versus the corresponding observed lines with the highest GY, the M9 model returned a coincide index (CI) of 55% (i.e., in both sets, 55% of the top 25% values were common), whereas for the highest performing ML model (gradient boosting regression), the CI was of 46%. This study highlights the potential of multi-data source approaches to accelerate the selection of heat-tolerant wheat genotypes.
整合基因组、高光谱成像(HSI)和环境数据可提高小麦产量预测能力,其中HSI为预测复杂的籽粒产量(GY)性状提供详细的光谱见解。与传统的基因组预测模型相比,将HSI数据与单核苷酸多态性标记(SNP)相结合可显著提高预测能力。在数年的时间里,由于天气条件多样,预测能力有所不同。经过测试的最全面的参数模型,包括SNP、HSI和环境协变量数据,始终取得最佳结果,在考虑相同组学数据时,机器学习(ML)方法紧随其后。例如,在向前预测交叉验证方案下,最全面的模型(M9)使用2021年和2022年的数据预测了2023年生长季的GY,预测值与观测值之间的相关性为0.53。与不太复杂的模型相比,该模型表现出卓越的性能,强调了整合众多数据源及其交互作用的优势。此外,当比较预测值最高的前25%的品系与相应的观测值最高的品系时,M9模型的重合指数(CI)为55%(即,在两组中,前25%的值中有55%是相同的),而对于性能最佳的ML模型(梯度提升回归),CI为46%。本研究突出了多数据源方法在加速耐热小麦基因型选择方面的潜力。