Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil.
Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA.
Theor Appl Genet. 2023 Dec 15;137(1):9. doi: 10.1007/s00122-023-04512-w.
An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1-3 and 1-5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600-1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.
一种处理视觉评分中潜在误差和评分主观性的方法在模拟和蓝莓轮回选择育种计划中进行了评估,以帮助育种者做出决策。由于其简单性和易于实现,大多数基因组预测方法都是基于正态性假设。然而,在植物和动物育种中,连续性状通常被视为分类性状进行视觉评分,并作为高斯变量进行分析,从而违反了正态性假设,这可能会影响育种值的预测和遗传参数的估计。在这项研究中,我们使用混合模型、贝叶斯和机器学习方法检查了视觉评分对基因组预测和遗传参数估计的主要挑战。我们使用模拟和真实育种数据集评估了这些方法。我们在这项研究中的贡献有五个方面:(i)即使考虑到与视觉评分相关的误差,使用中间数量的类别(1-3 和 1-5)收集数据也是最佳策略;(ii)线性混合模型和贝叶斯线性回归对正态性违反具有鲁棒性,但当使用贝叶斯有序回归模型(BORM)和随机森林分类时,可以获得边际收益;(iii)BORM 可以更好地估计遗传参数;(iv)我们使用模拟数据得出的结论也适用于同源四倍体蓝莓中的真实数据;(v)连续和分类表型的比较发现,当无法收集连续表型时,投资于评估具有低误差的 600-1000 个分类数据点是提高预测能力的策略。我们的研究结果表明了有效使用视觉评分性状的最佳方法,以探索育种计划中的遗传信息,并强调了投资于评估团队培训和高质量表型的重要性。