Rio Simon, Charcosset Alain, Mary-Huard Tristan, Moreau Laurence, Rincent Renaud
CIRAD, UMR AGAP Institut, Montpellier, France.
UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France.
Methods Mol Biol. 2022;2467:77-112. doi: 10.1007/978-1-0716-2205-6_3.
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
基因组选择的效率在很大程度上取决于候选个体遗传价值的预测准确性。大量论文表明,校准集的构成是预测准确性的关键因素。定义不当的校准集可能导致准确性较低,而对于相同规模的校准集,经过优化的校准集与随机抽样相比,可显著提高准确性。或者,优化校准集可以通过与随机抽样达到相似的准确性水平,但减少表型测定单元数量的方式,来降低表型分析的成本。我们在此介绍设计校准集时必须考虑的不同因素,并回顾文献中提出的不同标准。我们将这些标准分为两类:基于亲缘关系的无模型标准,以及从线性混合模型推导得出的标准。我们介绍了针对特定预测目标的标准,包括对高度多样化群体、双亲家庭或杂种的预测。我们还回顾了更新校准集的不同方法,以及优化表型分析实验设计的不同程序。