Montesinos-López Osval Antonio, Montesinos-López José Cricelio, Singh Pawan, Lozano-Ramirez Nerida, Barrón-López Alberto, Montesinos-López Abelardo, Crossa José
Facultad de Telemática, Universidad de Colima, Colima, 28040, México.
Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, 36023, México.
G3 (Bethesda). 2020 Nov 5;10(11):4177-4190. doi: 10.1534/g3.120.401631.
The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.
被称为基因组选择(GS)的范式是一种培育新植物和动物的革命性方法。这是一种预测方法,因为它使用学习方法来完成其任务。不幸的是,没有一个通用模型可用于所有类型的预测;因此,对于每种输出类型(响应变量)都需要特定的方法。由于缺乏针对多元计数数据结果的有效方法,本文提出了一种多元泊松深度神经网络(MPDN)模型,用于同时对各种计数结果进行基因组预测。MPDN模型使用泊松分布的负对数似然作为损失函数,在隐藏层使用修正线性单元(RELU)激活函数来捕捉非线性模式,在输出层,使用指数激活函数在相同的计数尺度上产生输出。在两个计数数据实验数据集上,将所提出的MPDN模型与传统的广义泊松回归模型和单变量泊松深度学习模型进行了比较。我们发现,所提出的MPDL优于单变量泊松深度神经网络模型,但在预测方面并不优于单变量广义泊松回归模型。所有深度学习模型均以Tensorflow作为后端、Keras作为前端实现,这使得这些模型能够在中型和大型数据集上实现,这相对于之前用于多元计数数据的GS模型具有显著优势。