Danilevicz Monica F, Gill Mitchell, Anderson Robyn, Batley Jacqueline, Bennamoun Mohammed, Bayer Philipp E, Edwards David
School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia.
School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia.
Front Genet. 2022 May 18;13:822173. doi: 10.3389/fgene.2022.822173. eCollection 2022.
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
基因组预测工具支持基于统计方法的作物育种,例如基因组最佳线性无偏预测(GBLUP)。然而,这些工具并非旨在捕捉多维数据集中的非线性关系,也无法处理高维数据集,如无人机收集的图像数据。机器学习(ML)算法有潜力超越当前用于基因型到表型预测的工具的预测准确性,因为它们能够自动提取数据特征并在多个抽象层次上表示其关系。本综述探讨了应用统计和机器学习方法基于遗传标记、环境数据和图像进行作物育种表型性状预测所面临的挑战。我们阐述了可解释模型结构的优缺点,讨论了机器学习模型在作物育种中进行基因型到表型预测的潜力以及面临的挑战,包括高质量数据集的稀缺、元数据注释不一致以及机器学习模型的要求等。