Lehermeier Christina, Krämer Nicole, Bauer Eva, Bauland Cyril, Camisan Christian, Campo Laura, Flament Pascal, Melchinger Albrecht E, Menz Monica, Meyer Nina, Moreau Laurence, Moreno-González Jesús, Ouzunova Milena, Pausch Hubert, Ranc Nicolas, Schipprack Wolfgang, Schönleben Manfred, Walter Hildrun, Charcosset Alain, Schön Chris-Carolin
Plant Breeding, Technische Universität München, 85354 Freising, Germany.
INRA, UMR de Génétique Végétale, 91190 Gif-sur-Yvette, France.
Genetics. 2014 Sep;198(1):3-16. doi: 10.1534/genetics.114.161943.
The efficiency of marker-assisted prediction of phenotypes has been studied intensively for different types of plant breeding populations. However, one remaining question is how to incorporate and counterbalance information from biparental and multiparental populations into model training for genome-wide prediction. To address this question, we evaluated testcross performance of 1652 doubled-haploid maize (Zea mays L.) lines that were genotyped with 56,110 single nucleotide polymorphism markers and phenotyped for five agronomic traits in four to six European environments. The lines are arranged in two diverse half-sib panels representing two major European heterotic germplasm pools. The data set contains 10 related biparental dent families and 11 related biparental flint families generated from crosses of maize lines important for European maize breeding. With this new data set we analyzed genome-based best linear unbiased prediction in different validation schemes and compositions of estimation and test sets. Further, we theoretically and empirically investigated marker linkage phases across multiparental populations. In general, predictive abilities similar to or higher than those within biparental families could be achieved by combining several half-sib families in the estimation set. For the majority of families, 375 half-sib lines in the estimation set were sufficient to reach the same predictive performance of biomass yield as an estimation set of 50 full-sib lines. In contrast, prediction across heterotic pools was not possible for most cases. Our findings are important for experimental design in genome-based prediction as they provide guidelines for the genetic structure and required sample size of data sets used for model training.
针对不同类型的植物育种群体,人们对标记辅助表型预测的效率进行了深入研究。然而,一个悬而未决的问题是如何将来自双亲群体和多亲群体的信息纳入全基因组预测的模型训练中,并进行平衡。为了解决这个问题,我们评估了1652个双单倍体玉米(Zea mays L.)品系的测交表现,这些品系用56110个单核苷酸多态性标记进行了基因分型,并在四到六个欧洲环境中对五个农艺性状进行了表型分析。这些品系被安排在两个不同的半同胞群体中,代表了欧洲两个主要的杂种优势种质库。该数据集包含10个相关的双亲马齿型家系和11个相关的双亲硬粒型家系,这些家系来自对欧洲玉米育种重要的玉米品系杂交。利用这个新数据集,我们分析了不同验证方案以及估计集和测试集组成情况下基于基因组的最佳线性无偏预测。此外,我们从理论和实证两方面研究了多亲群体中的标记连锁相位。总体而言,通过在估计集中组合几个半同胞家系,可以实现与双亲家系内相似或更高的预测能力。对于大多数家系,估计集中375个半同胞品系足以达到与50个全同胞品系的估计集相同的生物量产量预测性能。相比之下,在大多数情况下,跨杂种优势群的预测是不可能的。我们的研究结果对于基于基因组的预测中的实验设计很重要,因为它们为用于模型训练的数据集的遗传结构和所需样本量提供了指导。