Galli Giovanni, Sabadin Felipe, Yassue Rafael Massahiro, Galves Cassia, Carvalho Humberto Fanelli, Crossa Jose, Montesinos-López Osval Antonio, Fritsche-Neto Roberto
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil.
School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States.
Front Plant Sci. 2022 Mar 7;13:845524. doi: 10.3389/fpls.2022.845524. eCollection 2022.
Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as "genomic images." In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP. Also, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.
诸如多层感知器(MLP)和卷积神经网络(CNN)之类的机器学习方法已成为基因组预测(GP)中很有前景的方法。在此背景下,我们在一个玉米杂交种的案例研究中评估了MLP和CNN在回归和分类任务上的性能。基因组信息以关系矩阵的形式提供给MLP,以“基因组图像”的形式提供给CNN。在回归任务中,将机器学习模型与GBLUP进行了比较。在分类任务下,对MLP和CNN进行了比较。在这种情况下,对性状(株高和籽粒产量)进行离散化处理,以创建平衡(中等选择强度)和不平衡(极端选择强度)数据集用于进一步评估。对MLP和CNN进行了自动超参数搜索,并报告了最佳模型。对于这两种任务类型,在验证方案下计算了几个指标,以评估预测方法和其他变量的效果。总体而言,MLP和CNN呈现出与GBLUP具有竞争力的结果。此外,我们为基因组预测的自动化机器学习及其对植物育种的影响带来了新的见解。