Montesinos-López Osval A, Montesinos-López Abelardo, Mosqueda-González Brandon A, Bentley Alison R, Lillemo Morten, Varshney Rajeev K, Crossa José
Facultad de Telemática, Universidad de Colima, Colima, Mexico.
Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico.
Front Genet. 2021 Dec 17;12:798840. doi: 10.3389/fgene.2021.798840. eCollection 2021.
Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.
基因组选择(GS)有潜力彻底改变预测性植物育种。对一个参考群体进行表型分析和基因分型,以训练一个统计模型,该模型用于对仅进行了基因分型的新个体进行基于基因组的预测。在这种情况下,深度神经网络是一种机器学习模型,已被广泛应用于GS研究中,因为它们不是参数方法,使其更善于捕捉非线性模式。然而,由于需要调整众多超参数,深度神经网络的训练过程极具挑战性,尤其是当调整不当会导致预测有偏差时。在本文中,我们提出了一种简单的方法来校准(调整)深度学习应用中连续响应变量的预测。我们使用四个作物育种数据集评估了所提出的深度学习校准方法(DL_M2),并将其性能与标准深度学习方法(DL_M1)以及标准基因组最佳线性无偏预测器(GBLUP)进行了比较。虽然GBLUP总体上是最准确的模型,但与传统深度学习方法(DL_M1)相比,所提出的深度学习校准方法(DL_M2)有助于提高所有数据集中基于基因组的预测性能。综上所述,我们提供了证据,证明扩展使用所提出的校准方法来评估其在应用于植物育种的GS背景下预测性能的潜力和一致性。