Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
Statistical Genetics Unit, RAGT 2N, 1 Route de Moyrazès, 12510, Druelle, France.
Theor Appl Genet. 2021 Sep;134(9):3069-3081. doi: 10.1007/s00122-021-03880-5. Epub 2021 Jun 12.
Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.
通过衰减各个周期的特定影响,对所有选择周期的数据进行模型训练可获得最高的预测准确性。预期可靠性是不同校准集准确性的可靠预测因子。从表型选择向基于基因组的选择转变需要深入了解决定基因组预测准确性的因素。我们分析了来自商业玉米育种计划的实验数据,以研究基因组测量是否可以帮助确定模型训练的最佳校准集。该数据集由六个连续的选择周期组成,包括用至少 12000 个 SNP 标记对 5968 个双单倍体系进行测验杂交。我们结合不同样本量和基因组测量的校准集(有效样本量、平均最大亲缘关系、预期可靠性、共同多态性 SNP 数量和连锁相相似性)在两个独立的预测集中评估了基因组预测准确性。我们的结果表明,在整个选择周期中,谷物干物质产量的预测准确性高达 0.57,谷物干物质含量的预测准确性高达 0.76。在模型训练中纳入所有选择周期的数据可获得最佳结果,因为校准集和预测集之间的相互作用以及不同测试者和特定年份的影响得到了减弱。在基因组测量中,基因组育种值的预期可靠性是不同校准集获得的经验准确性的最佳预测因子。对于谷物产量,在一个预测集中观察到预期可靠性和经验可靠性之间存在很大差异。我们建议使用此差异作为确定模型重新训练中给定选择周期的表型数据权重以及在同时具有基因组育种值和表型数据时进行选择的指导。