多输出和堆叠方法对使用机器学习算法从基因型预测饲料效率的影响。

Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms.

作者信息

Mora Mónica, González Pablo, Quevedo José Ramón, Montañés Elena, Tusell Llibertat, Bergsma Rob, Piles Miriam

机构信息

Departamento de Ciencia Animal, Universidad Politècnica de València, Valencia, Spain.

Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain.

出版信息

J Anim Breed Genet. 2023 Nov;140(6):638-652. doi: 10.1111/jbg.12815. Epub 2023 Jul 5.

Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero-one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.

饲养是肉类生产中最大的经济成本；因此，在大多数家畜育种计划中，选择提高与饲料效率相关的性状是一个目标。自1963年科奇提出以来，剩余采食量（RFI），即实际采食量与基于动物需求的预期采食量之间的差异，一直被用作提高饲料效率的选择标准。在生长猪中，它被计算为日采食量（DFI）、平均日增重（ADG）、背膘厚度（BFT）和代谢体重（MW）的多元回归模型的残差。最近，有人提出使用单输出机器学习算法和单核苷酸多态性（SNP）信息作为预测变量进行生长猪的基因组选择，但与其他物种一样，RFI的预测质量总体较差。然而，有人认为可以通过多输出或堆叠方法来提高预测质量。为此，实施了四种策略来预测RFI。其中两种策略对应于通过间接方式计算RFI，即使用从（i）个体（多个单输出策略）或（ii）同时预测（多输出策略）获得的其组成部分的预测值。另外两种策略对应于直接预测RFI，即（iii）将其组成部分的个体预测作为预测变量与基因型一起使用（堆叠策略），或（iv）仅使用基因型作为RFI的预测因子（单输出策略）。单输出策略被视为基准。本研究旨在使用从5828头生长猪和45610个SNP记录的数据来检验前三个假设。对于所有策略，拟合了两种不同的学习方法：随机森林（RF）和支持向量回归（SVR）。为了测试所有策略，实施了嵌套交叉验证（CV），外部进行10折CV，内部进行三折CV以进行超参数调整。使用RF识别出的信息量不断增加（从200个到3000个）的不同SNP子集作为预测变量，重复此方案。结果表明，使用1000个SNP时预测性能最高，尽管特征选择的稳定性较差（满分1分，得0.13分）。对于所有SNP子集，基准策略显示出最佳的预测性能。以RF作为学习器，使用1000个信息量最大的SNP作为预测因子，在测试集中获得的10个值的均值（标准差）分别为：斯皮尔曼相关系数为0.23（0.04），零一损失为0.83（0.04），秩距离损失为0.33（0.03）。我们得出结论，与单输出策略相比，RFI预测组成部分（DFI、ADG、MW和BFT）的信息无助于提高该性状的预测质量。

Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献