Crossa José, Montesinos-Lopez Osval A, Costa-Neto Germano, Vitale Paolo, Martini Johannes W R, Runcie Daniel, Fritsche-Neto Roberto, Montesinos-Lopez Abelardo, Pérez-Rodríguez Paulino, Gerard Guillermo, Dreisigacker Susanna, Crespo-Herrera Leonardo, Pierre Carolina Saint, Lillemo Morten, Cuevas Jaime, Bentley Alison, Ortiz Rodomiro
Louisiana State University, College of Agriculture, Baton Rouge, LA, USA; Colegio de Postgraduados, Montecillos, CP 56230, Estado de México, Mexico; International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico; Department of Statistics and Operations Research and Distinguished Scientist Fellowship Program, King Saud University, Riyadh 11451, Saudi Arabia.
Facultad de Telemática, Universidad de Colima, CP 28040 Estado de Colima, Mexico.
Trends Plant Sci. 2025 Feb;30(2):167-184. doi: 10.1016/j.tplants.2024.09.011. Epub 2024 Oct 26.
Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.
统计机器学习(ML)从大量的基因组、表型和环境数据中提取模式。ML算法自动识别相关特征,并使用交叉验证来确保模型的稳健性,并提高新株系预测的可靠性。此外,对基因型与环境(G×E)相互作用的ML分析可以深入了解影响特定环境中性能的遗传因素。通过利用历史育种数据,ML简化了策略并使分析自动化,以揭示基因组模式。在本综述中,我们研究了大数据,包括多性状基因组学、表型组学和环境协变量,对植物育种中基因组预测的变革性影响。我们讨论了大数据和ML如何通过提高预测准确性、加深我们对G×E相互作用的理解以及通过分析广泛多样的数据集来优化育种策略,从而彻底改变该领域。