Suppr超能文献

一种基于混合深度学习的环境选择最优基因型方法。

A hybrid deep learning-based approach for optimal genotype by environment selection.

作者信息

Khalilzadeh Zahra, Kashanian Motahareh, Khaki Saeed, Wang Lizhi

机构信息

Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States.

School of Industrial Engineering and Management, Oklahoma State University, Stillwater, OK, United States.

出版信息

Front Artif Intell. 2024 Dec 11;7:1312115. doi: 10.3389/frai.2024.1312115. eCollection 2024.

Abstract

The ability to accurately predict the yields of different crop genotypes in response to weather variability is crucial for developing climate resilient crop cultivars. Genotype-environment interactions introduce large variations in crop-climate responses, and are hard to factor in to breeding programs. Data-driven approaches, particularly those based on machine learning, can help guide breeding efforts by factoring in genotype-environment interactions when making yield predictions. Using a new yield dataset containing 93,028 records of soybean hybrids across 159 locations, 28 states, and 13 years, with 5,838 distinct genotypes and daily weather data over a 214-day growing season, we developed two convolutional neural network (CNN) models: one that integrates CNN and fully-connected neural networks (CNN model), and another that incorporates a long short-term memory (LSTM) layer after the CNN component (CNN-LSTM model). By applying the Generalized Ensemble Method (GEM), we combined the CNN-based models and optimized their weights to improve overall predictive performance. The dataset provided unique genotype information on seeds, enabling an investigation into the potential of planting different genotypes based on weather variables. We employed the proposed GEM model to identify the best-performing genotypes across various locations and weather conditions, making yield predictions for all potential genotypes in each specific setting. To assess the performance of the GEM model, we evaluated it on unseen genotype-location combinations, simulating real-world scenarios where new genotypes are introduced. By combining the base models, the GEM ensemble approach provided much better prediction accuracy compared to using the CNN-LSTM model alone and slightly better accuracy than the CNN model, as measured by both RMSE and MAE on the validation and test sets. The proposed data-driven approach can be valuable for genotype selection in scenarios with limited testing years. In addition, we explored the impact of incorporating state-level soil data alongside the weather, location, genotype and year variables. Due to data constraints, including the absence of latitude and longitude details, we used uniform soil variables for all locations within the same state. This limitation restricted our spatial information to state-level knowledge. Our findings suggested that integrating state-level soil variables did not substantially enhance the predictive capabilities of the models. We also performed a feature importance analysis using RMSE change to identify crucial predictors. Location showed the highest RMSE change, followed by genotype and year. Among weather variables, maximum direct normal irradiance (MDNI) and average precipitation (AP) displayed higher RMSE changes, indicating their importance.

摘要

准确预测不同作物基因型对天气变化的产量响应能力对于培育适应气候变化的作物品种至关重要。基因型与环境的相互作用导致作物对气候的响应存在很大差异,且难以纳入育种计划。数据驱动方法,特别是基于机器学习的方法,在进行产量预测时通过考虑基因型与环境的相互作用,有助于指导育种工作。利用一个新的产量数据集,该数据集包含跨越159个地点、28个州和13年的93028条大豆杂交种记录,有5838个不同基因型以及214天生长季的每日天气数据,我们开发了两个卷积神经网络(CNN)模型:一个将CNN与全连接神经网络集成(CNN模型),另一个在CNN组件之后加入长短期记忆(LSTM)层(CNN-LSTM模型)。通过应用广义集成方法(GEM),我们将基于CNN的模型进行组合并优化其权重以提高整体预测性能。该数据集提供了种子的独特基因型信息,从而能够研究基于天气变量种植不同基因型的潜力。我们使用所提出的GEM模型来识别不同地点和天气条件下表现最佳的基因型,对每个特定环境中所有潜在基因型进行产量预测。为了评估GEM模型的性能,我们在未见过的基因型 - 地点组合上对其进行评估,模拟引入新基因型的实际场景。通过组合基础模型,与单独使用CNN-LSTM模型相比,GEM集成方法提供了更好的预测准确性,并且在验证集和测试集上,以RMSE和MAE衡量,其准确性略高于CNN模型。所提出的数据驱动方法在测试年份有限的情况下对于基因型选择可能具有重要价值。此外,我们探讨了将州级土壤数据与天气、地点、基因型和年份变量一起纳入的影响。由于数据限制,包括缺少经纬度细节,我们对同一州内的所有地点使用统一的土壤变量。这种限制将我们的空间信息限制在州级知识范围内。我们的研究结果表明,整合州级土壤变量并没有显著提高模型的预测能力。我们还使用RMSE变化进行了特征重要性分析以识别关键预测因子。地点显示出最高的RMSE变化,其次是基因型和年份。在天气变量中,最大直接法向辐照度(MDNI)和平均降水量(AP)显示出较高的RMSE变化,表明它们的重要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验