Crossa José, Martini Johannes W R, Gianola Daniel, Pérez-Rodríguez Paulino, Jarquin Diego, Juliana Philomin, Montesinos-López Osval, Cuevas Jaime
Biometrics and Statistics Unit, Genetic Resources Program, and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.
Programa de Postgrado de Socioeconomia, Estadistica e Informatica, Colegio de Postgraduados, Texcoco, Mexico.
Front Genet. 2019 Dec 9;10:1168. doi: 10.3389/fgene.2019.01168. eCollection 2019.
Deep learning (DL) is a promising method for genomic-enabled prediction. However, the implementation of DL is difficult because many hyperparameters (number of hidden layers, number of neurons, learning rate, number of epochs, batch size, etc.) need to be tuned. For this reason, deep kernel methods, which only require defining the number of layers, may be an attractive alternative. Deep kernel methods emulate DL models with a large number of neurons, but are defined by relatively easily computed covariance matrices. In this research, we compared the genome-based prediction of DL to a deep kernel (arc-cosine kernel, AK), to the commonly used non-additive Gaussian kernel (GK), as well as to the conventional additive genomic best linear unbiased predictor (GBLUP/GB). We used two real wheat data sets for benchmarking these methods. On average, AK and GK outperformed DL and GB. The gain in terms of prediction performance of AK and GK over DL and GB was not large, but AK and GK have the advantage that only one parameter, the number of layers (AK) or the bandwidth parameter (GK), has to be tuned in each method. Furthermore, although AK and GK had similar performance, deep kernel AK is easier to implement than GK, since the parameter "number of layers" is more easily determined than the bandwidth parameter of GK. Comparing AK and DL for the data set of year 2015-2016, the difference in performance of the two methods was bigger, with AK predicting much better than DL. On this data, the optimization of the hyperparameters for DL was difficult and the finally used parameters may have been suboptimal. Our results suggest that AK is a good alternative to DL with the advantage that practically no tuning process is required.
深度学习(DL)是一种用于基因组预测的有前景的方法。然而,DL的实施具有难度,因为需要调整许多超参数(隐藏层数、神经元数量、学习率、轮次数量、批量大小等)。因此,仅需定义层数的深度核方法可能是一个有吸引力的替代方案。深度核方法通过大量神经元来模拟DL模型,但由相对容易计算的协方差矩阵定义。在本研究中,我们将基于基因组的DL预测与深度核(反余弦核,AK)、常用的非加性高斯核(GK)以及传统的加性基因组最佳线性无偏预测器(GBLUP/GB)进行了比较。我们使用了两个真实的小麦数据集来对这些方法进行基准测试。平均而言,AK和GK的表现优于DL和GB。AK和GK相对于DL和GB在预测性能方面的提升并不显著,但AK和GK具有这样的优势,即每种方法只需调整一个参数,对于AK是层数,对于GK是带宽参数。此外,尽管AK和GK的性能相似,但深度核AK比GK更容易实现,因为“层数”参数比GK的带宽参数更容易确定。对于2015 - 2016年的数据集比较AK和DL,两种方法的性能差异更大,AK的预测比DL好得多。对于该数据,DL超参数的优化很困难,最终使用的参数可能并非最优。我们的结果表明,AK是DL很好的替代方案,其优点是几乎不需要调整过程。