Howard Réka, Jarquin Diego, Crossa José
University of Nebraska-Lincoln, Lincoln, NE, USA.
International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.
Methods Mol Biol. 2022;2467:139-156. doi: 10.1007/978-1-0716-2205-6_5.
Genomic selection (GS) is a methodology that revolutionized the process of breeding improved genetic materials in plant and animal breeding programs. It uses predicted genomic values of the potential of untested/unobserved genotypes as surrogates of phenotypes during the selection process. Such that the predicted genomic values are obtained using exclusively the marker profiles of the untested genotypes, and these potentially can be used by breeders for screening the genotypes to be advanced in the breeding pipeline, to identify potential parents for next improvement cycles, or to find optimal crosses for targeting genotypes among others. Conceptually, GS initially requires a set of genotypes with both molecular marker information and phenotypic data for model calibration and then the performance of untested genotypes is predicted using their marker profiles only. Hence, it is expected that breeders would look at these values in order to conduct selections. Even though the concept of GS seems trivial, due to the high dimensional nature of the data delivered from modern sequencing technologies where the number of molecular markers (p) excess by far the number of data points available for model fitting (n; p ≫ n) a complete renovated set of prediction models was needed to cope with this challenge. In this chapter, we provide a conceptual framework for comparing statistical models to overcome the "large p, small n problem." Given the very large diversity of GS models only the most popular are presented here; mainly we focused on linear regression-based models and nonparametric models that predict the genetic estimated breeding values (GEBV) in a single environment considering a single trait only, mainly in the context of plant breeding.
基因组选择(GS)是一种彻底变革动植物育种计划中改良遗传材料培育过程的方法。它在选择过程中使用未测试/未观察到的基因型潜力的预测基因组值作为表型的替代物。也就是说,预测基因组值仅使用未测试基因型的标记概况来获得,育种者可以利用这些值在育种流程中筛选待推进的基因型,识别下一轮改良周期的潜在亲本,或寻找针对目标基因型的最佳杂交组合等。从概念上讲,基因组选择最初需要一组同时具有分子标记信息和表型数据的基因型用于模型校准,然后仅使用未测试基因型的标记概况来预测其表现。因此,预计育种者会查看这些值以进行选择。尽管基因组选择的概念看似简单,但由于现代测序技术提供的数据具有高维性,其中分子标记的数量(p)远远超过可用于模型拟合的数据点数量(n;p≫n),因此需要一套全新的预测模型来应对这一挑战。在本章中,我们提供了一个用于比较统计模型以克服“大p,小n问题”的概念框架。鉴于基因组选择模型的多样性非常大,这里仅介绍最流行的模型;主要关注基于线性回归的模型和非参数模型,这些模型仅在单一环境中考虑单一性状来预测遗传估计育种值(GEBV),主要是在植物育种的背景下。