梯度提升机和贝叶斯阈值 BLUP 用于小麦育种中基于基因组的分类性状预测的比较。

Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding.

机构信息

Facultad de Telemática, Univ. de Colima, Colima, Colima, 28040, México.

Univ. Tecnológica de Manzanillo, Manzanillo, Colima, México.

出版信息

Plant Genome. 2022 Sep;15(3):e20214. doi: 10.1002/tpg2.20214. Epub 2022 May 10.

DOI:10.1002/tpg2.20214

PMID:35535459

Abstract

Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS.

摘要

基因组选择（GS）是一种改变植物育种的预测方法。基因组选择使用可用的表型和基因型数据训练统计机器学习模型，并用该模型对仅进行基因型分析的个体进行预测。出于这个原因，一些统计机器学习方法正在被应用于 GS，但为了在预测过程的早期更好地选择新的基因型，必须继续探索新的统计机器学习算法。在本文中，我们在贝叶斯阈值基因组最佳线性无偏预测模型（TGBLUP；在 GS 中很流行）和梯度提升机（GBM）之间进行了基准测试研究。这种比较是使用四个真实的小麦（Triticum aestivum L.）数据集进行的，这些数据集的分类性状是用两种度量来衡量的：测试集中正确分类的案例比例（PCCC）和 Kappa 系数。在 10 个具有 4 种不同测试比例（20%、40%、60%和 80%）的随机分区中，我们比较了两种算法，发现在四个数据集的三个中，GBM 在两个度量（PCCC 和 Kappa 系数）方面都优于 TGBLUP 模型。在较大的数据集（数据集 3 和数据集 4）中，GBM 在预测准确性方面的增益非常显著。因此，我们鼓励更多地使用 GBM 在 GS 中的研究，以评估其在 GS 背景下的预测性能方面的优势。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

梯度提升机和贝叶斯阈值 BLUP 用于小麦育种中基于基因组的分类性状预测的比较。

Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding.

机构信息

出版信息

相似文献

引用本文的文献

梯度提升机和贝叶斯阈值 BLUP 用于小麦育种中基于基因组的分类性状预测的比较。

Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding.

机构信息

出版信息

相似文献

引用本文的文献