三种高斯核调优策略在单变量基因组预测中的比较

A Comparison between Three Tuning Strategies for Gaussian Kernels in the Context of Univariate Genomic Prediction.

机构信息

Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico.

Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA.

出版信息

Genes (Basel). 2022 Dec 3;13(12):2282. doi: 10.3390/genes13122282.

DOI:10.3390/genes13122282

PMID:36553547

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9778581/

Abstract

Genomic prediction is revolutionizing plant breeding since candidate genotypes can be selected without the need to measure their trait in the field. When a reference population contains both phenotypic and genotypic information, it is trained by a statistical machine learning method that is subsequently used for making predictions of breeding or phenotypic values of candidate genotypes that were only genotyped. Nevertheless, the successful implementation of the genomic selection (GS) methodology depends on many factors. One key factor is the type of statistical machine learning method used since some are unable to capture nonlinear patterns available in the data. While kernel methods are powerful statistical machine learning algorithms that capture complex nonlinear patterns in the data, their successful implementation strongly depends on the careful tuning process of the involved hyperparameters. As such, in this paper we compare three methods of tuning (manual tuning, grid search, and Bayesian optimization) for the Gaussian kernel under a Bayesian best linear unbiased predictor model. We used six real datasets of wheat (Triticum aestivum L.) to compare the three strategies of tuning. We found that if we want to obtain the major benefits of using Gaussian kernels, it is very important to perform a careful tuning process. The best prediction performance was observed when the tuning process was performed with grid search and Bayesian optimization. However, we did not observe relevant differences between the grid search and Bayesian optimization approach. The observed gains in terms of prediction performance were between 2.1% and 27.8% across the six datasets under study.

摘要

基因组预测正在彻底改变植物育种，因为可以在不需要在田间测量其性状的情况下选择候选基因型。当参考群体同时包含表型和基因型信息时，可以通过统计机器学习方法进行训练，然后用于对仅进行基因型分析的候选基因型的育种值或表型值进行预测。然而，基因组选择（GS）方法的成功实施取决于许多因素。一个关键因素是所使用的统计机器学习方法的类型，因为有些方法无法捕捉到数据中可用的非线性模式。核方法是强大的统计机器学习算法，可捕捉数据中的复杂非线性模式，但它们的成功实施强烈依赖于涉及的超参数的仔细调整过程。因此，在本文中，我们在贝叶斯最佳线性无偏预测模型下比较了高斯核的三种调优方法（手动调优、网格搜索和贝叶斯优化）。我们使用了六个真实的小麦（Triticum aestivum L.）数据集来比较这三种调优策略。我们发现，如果我们想获得使用高斯核的主要优势，那么进行仔细的调优过程非常重要。在使用网格搜索和贝叶斯优化进行调优时，观察到了最佳的预测性能。然而，我们没有观察到网格搜索和贝叶斯优化方法之间的相关差异。在研究的六个数据集上，观察到的预测性能提高幅度在 2.1%到 27.8%之间。