Ben Hassen M, Cao T V, Bartholomé J, Orasen G, Colombi C, Rakotomalala J, Razafinimpiasa L, Bertone C, Biselli C, Volante A, Desiderio F, Jacquin L, Valè G, Ahmadi N
Department of Agriculture and Environmental Sciences, University of Milan, Via Giovanni Celoria, 2, 20133, Milan, Italy.
Cirad, UMR AGAP, Avenue Agropolis, 34398, Montpellier Cedex 5, France.
Theor Appl Genet. 2018 Feb;131(2):417-435. doi: 10.1007/s00122-017-3011-4. Epub 2017 Nov 14.
Rice breeding programs based on pedigree schemes can use a genomic model trained with data from their working collection to predict performances of progenies produced through rapid generation advancement. So far, most potential applications of genomic prediction in plant improvement have been explored using cross validation approaches. This is the first empirical study to evaluate the accuracy of genomic prediction of the performances of progenies in a typical rice breeding program. Using a cross validation approach, we first analyzed the effects of marker selection and statistical methods on the accuracy of prediction of three different heritability traits in a reference population (RP) of 284 inbred accessions. Next, we investigated the size and the degree of relatedness with the progeny population (PP) of sub-sets of the RP that maximize the accuracy of prediction of phenotype across generations, i.e., for 97 F-F lines derived from biparental crosses between 31 accessions of the RP. The extent of linkage disequilibrium was high (r = 0.2 at 0.80 Mb in RP and at 1.1 Mb in PP). Consequently, average marker density above one per 22 kb did not improve the accuracy of predictions in the RP. The accuracy of progeny prediction varied greatly depending on the composition of the training set, the trait, LD and minor allele frequency. The highest accuracy achieved for each trait exceeded 0.50 and was only slightly below the accuracy achieved by cross validation in the RP. Our results thus show that relatively high accuracy (0.41-0.54) can be achieved using only a rather small share of the RP, most related to the PP, as the training set. The practical implications of these results for rice breeding programs are discussed.
基于系谱方案的水稻育种计划可以使用一个根据其工作群体数据训练的基因组模型,来预测通过快速世代推进产生的后代的表现。到目前为止,基因组预测在植物改良中的大多数潜在应用都是使用交叉验证方法进行探索的。这是第一项评估典型水稻育种计划中后代表现的基因组预测准确性的实证研究。我们采用交叉验证方法,首先分析了标记选择和统计方法对一个由284个自交材料组成的参考群体(RP)中三种不同遗传力性状预测准确性的影响。接下来,我们研究了RP子集的大小及其与后代群体(PP)的亲缘关系程度,这些子集能使跨代表型预测的准确性最大化,即对于来自RP的31个材料之间双亲杂交产生的97个F-F系。连锁不平衡程度较高(RP中在0.80 Mb处r = 0.2,PP中在1.1 Mb处r = 0.2)。因此,平均标记密度高于每22 kb一个并没有提高RP中的预测准确性。后代预测的准确性因训练集的组成、性状、连锁不平衡和小等位基因频率而异。每个性状实现的最高准确性超过0.50,仅略低于RP中交叉验证所达到的准确性。因此,我们的结果表明,仅使用与PP最相关的RP中相当小的一部分作为训练集,就能实现相对较高的准确性(0.41 - 0.54)。本文讨论了这些结果对水稻育种计划的实际意义。