Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain.
Institute for Sustainable Agriculture, Spanish Research Council (CSIC), Córdoba, Spain.
Theor Appl Genet. 2021 Nov;134(11):3595-3609. doi: 10.1007/s00122-021-03916-w. Epub 2021 Aug 3.
The strong genetic structure observed in Mediterranean oats affects the predictive ability of genomic prediction as well as the performance of training set optimization methods. In this study, we investigated the efficiency of genomic prediction and training set optimization in a highly structured population of cultivars and landraces of cultivated oat (Avena sativa) from the Mediterranean basin, including white (subsp. sativa) and red (subsp. byzantina) oats, genotyped using genotype-by-sequencing markers and evaluated for agronomic traits in Southern Spain. For most traits, the predictive abilities were moderate to high with little differences between models, except for biomass for which Bayes-B showed a substantial gain compared to other models. The consistency between the structure of the training population and the population to be predicted was key to the predictive ability of genomic predictions. The predictive ability of inter-subspecies predictions was indeed much lower than that of intra-subspecies predictions for all traits. Regarding training set optimization, the linear mixed model optimization criteria (prediction error variance (PEVmean) and coefficient of determination (CDmean)) performed better than the heuristic approach "partitioning around medoids," even under high population structure. The superiority of CDmean and PEVmean could be explained by their ability to adapt the representation of each genetic group according to those represented in the population to be predicted. These results represent an important step towards the implementation of genomic prediction in oat breeding programs and address important issues faced by the genomic prediction community regarding population structure and training set optimization.
在中观燕麦中观察到的强大遗传结构会影响基因组预测的预测能力以及训练集优化方法的性能。在这项研究中,我们研究了高度结构化的栽培燕麦品种和地方品种群体(包括白燕麦(亚种 sativa)和红燕麦(亚种 byzantina))的基因组预测和训练集优化效率,该群体来自地中海盆地,使用基于测序的基因型标记进行基因型分析,并在西班牙南部评估了农艺性状。对于大多数性状,预测能力为中等至高,不同模型之间的差异很小,除了生物质,贝叶斯-B 与其他模型相比有很大的提高。训练群体和要预测的群体之间的结构一致性是基因组预测预测能力的关键。事实上,与种内预测相比,种间预测的所有性状的预测能力都要低得多。关于训练集优化,线性混合模型优化标准(预测误差方差(PEVmean)和决定系数(CDmean))比启发式方法“围绕中位数分区”表现更好,即使在高度结构的情况下也是如此。CDmean 和 PEVmean 的优越性可以解释为它们能够根据预测群体中代表的遗传群体来适应每个遗传群体的表示。这些结果代表了在燕麦育种计划中实施基因组预测的重要一步,并解决了基因组预测社区在群体结构和训练集优化方面面临的重要问题。