Suppr超能文献

使用梯度提升框架,通过基因组和环境预测因子预测玉米表型性状

Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks.

作者信息

Westhues Cathy C, Mahone Gregory S, da Silva Sofia, Thorwarth Patrick, Schmidt Malthe, Richter Jan-Christoph, Simianer Henner, Beissinger Timothy M

机构信息

Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany.

Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany.

出版信息

Front Plant Sci. 2021 Nov 11;12:699589. doi: 10.3389/fpls.2021.699589. eCollection 2021.

Abstract

The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method () and to two nonlinear gradient boosting methods based on decision tree algorithms (). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.

摘要

在气候变化背景下,培育在未来环境条件下性能稳定的作物品种是一项严峻挑战。在田间层面收集的环境数据,如土壤和气候信息,通过更精确地描述基因型与环境的相互作用,可能有助于提高基因组预测模型的预测能力,而这种相互作用是复杂作物农艺性状表型反应的关键组成部分。现代预测建模方法能够有效处理各种数据类型,并能够捕捉大型数据集中复杂的非线性关系。特别是,机器学习技术近年来受到了广泛关注。在此,我们利用玉米基因组到田间(G2F)计划收集的数据,研究了基于机器学习的模型对玉米两个表型性状的预测能力。我们分析的数据包括2014年至2017年在美国和加拿大各地分散进行的多环境试验(METs)。除了基因型数据外,还推导了一系列与土壤和天气相关的变量,并将其用于预测模型。将线性随机效应模型与一种线性正则化回归方法()以及两种基于决策树算法的非线性梯度提升方法()进行了比较。这些模型在四个预测问题下进行了评估:(1)新年份中的测试基因型和新基因型;(2)新年份中仅未观察到的基因型;(3)新地点中的测试基因型和新基因型;(4)新地点中仅未观察到的基因型。通过使用梯度提升方法纳入环境预测因子,预测新年份中新基因型的谷物产量性能的准确率比基线模型提高了20%。对于株高,无论是使用基于机器学习的方法还是使用详细的环境信息,都无法观察到预测能力的提高。使用梯度提升框架对关键环境因素的调查还显示,在我们的环境面板中,开花期温度、营养生长和灌浆期的水分频率和量以及土壤有机质含量是谷物产量的重要预测因子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c7b/8647909/15e247f3a393/fpls-12-699589-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验