基因组选择的训练集确定。

Training set determination for genomic selection.

机构信息

Department of Agronomy, National Taiwan University, Taipei, Taiwan.

出版信息

Theor Appl Genet. 2019 Oct;132(10):2781-2792. doi: 10.1007/s00122-019-03387-0. Epub 2019 Jul 2.

Abstract

A new optimality criterion is proposed to determine a training set for genomic selection, which is derived from Pearson's correlation between GEBVs and phenotypic values of a test set. R functions are provided to generate the optimal training set. For a specified test set, we develop a highly efficient algorithm to determine an optimal subset from a large candidate set in which the individuals have been genotyped but not phenotyped yet. The chosen subset serves as a training set to be phenotyped, and then a genomic selection (GS) model is built based on its phenotype and genotype data. In this study, we consider the additive effects whole-genome regression model and adopt ridge regression estimation for marker effects in the GS model. The resulting GS model is then employed to predict genomic estimated breeding values (GEBVs) for the individuals of the test set, which have been genotyped only. We propose a new optimality criterion to determine the required training set, which is derived directly from Pearson's correlation between GEBVs and phenotypic values of the test set. Pearson's correlation is the standard measure for prediction accuracy of a GS model. Our proposed methods can be applied to data with the varying degree of population structure. All the R functions for implementing our training set determination algorithms are available from the R package TSDFGS. The algorithms are illustrated with two datasets which have strong (rice genome dataset) and mild (wheat genome dataset) population structures. Our methods are shown to be advantageous over existing ones, mainly because they fully use the genomic relationship between the test set and the training set by taking into account both the variance and bias for predicting GEBVs.

摘要

提出了一种新的最优性准则来确定基因组选择的训练集，该准则源自测试集的 GEBV 和表型值之间的皮尔逊相关系数。提供了 R 函数来生成最优的训练集。对于指定的测试集，我们开发了一种高效的算法，从大量候选集中确定最佳子集，其中个体已经进行了基因型但尚未进行表型分析。选择的子集作为要进行表型分析的训练集，然后基于其表型和基因型数据构建基因组选择 (GS) 模型。在这项研究中，我们考虑了加性效应全基因组回归模型，并在 GS 模型中采用岭回归估计标记效应。然后，将所得的 GS 模型用于预测仅进行基因型分析的测试集个体的基因组估计育种值 (GEBV)。我们提出了一种新的最优性准则来确定所需的训练集，该准则直接源自测试集的 GEBV 和表型值之间的皮尔逊相关系数。皮尔逊相关系数是 GS 模型预测准确性的标准衡量指标。我们的方法可应用于具有不同群体结构程度的数据。用于实现我们的训练集确定算法的所有 R 函数都可从 R 包 TSDFGS 获得。使用具有强（水稻基因组数据集）和弱（小麦基因组数据集）群体结构的两个数据集说明了我们的方法。与现有方法相比，我们的方法具有优势，主要是因为它们通过考虑预测 GEBV 的方差和偏差，充分利用了测试集和训练集之间的基因组关系。

相似文献

Training set determination for genomic selection.基因组选择的训练集确定。

Theor Appl Genet. 2019 Oct;132(10):2781-2792. doi: 10.1007/s00122-019-03387-0. Epub 2019 Jul 2.

Optimization of genomic selection training populations with a genetic algorithm.利用遗传算法优化基因组选择训练群体

Genet Sel Evol. 2015 May 6;47(1):38. doi: 10.1186/s12711-015-0116-6.

The value of early-stage phenotyping for wheat breeding in the age of genomic selection.基因组选择时代小麦早期表型选择的价值。

Theor Appl Genet. 2020 Aug;133(8):2499-2520. doi: 10.1007/s00122-020-03613-0. Epub 2020 Jun 1.

Sample size determination for training set optimization in genomic prediction.基因组预测中训练集优化的样本量确定。

Theor Appl Genet. 2023 Mar 13;136(3):57. doi: 10.1007/s00122-023-04254-9.

Training set optimization of genomic prediction by means of EthAcc.通过 EthAcc 对基因组预测进行训练集优化。

PLoS One. 2019 Feb 19;14(2):e0205629. doi: 10.1371/journal.pone.0205629. eCollection 2019.

Optimizing Training Population Data and Validation of Genomic Selection for Economic Traits in Soft Winter Wheat.优化软质冬小麦经济性状基因组选择的训练群体数据及验证

G3 (Bethesda). 2016 Sep 8;6(9):2919-28. doi: 10.1534/g3.116.032532.

An experimental approach for estimating the genomic selection advantage for Fusarium head blight and Septoria tritici blotch in winter wheat.一种用于估计冬小麦镰刀菌顶腐病和叶锈病的基因组选择优势的实验方法。

Theor Appl Genet. 2019 Aug;132(8):2425-2437. doi: 10.1007/s00122-019-03364-7. Epub 2019 May 29.

BWGS: A R package for genomic selection and its application to a wheat breeding programme.BWGS：一个基因组选择的 R 包及其在小麦育种计划中的应用。

PLoS One. 2020 Apr 2;15(4):e0222733. doi: 10.1371/journal.pone.0222733. eCollection 2020.

Training set optimization under population structure in genomic selection.基因组选择中群体结构下的训练集优化

Theor Appl Genet. 2015 Jan;128(1):145-58. doi: 10.1007/s00122-014-2418-4. Epub 2014 Nov 1.

Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.).小麦基因组选择实施的育种方案（Triticum spp.）。

Plant Sci. 2016 Jan;242:23-36. doi: 10.1016/j.plantsci.2015.08.021. Epub 2015 Sep 6.

引用本文的文献

Multi-trait ridge regression BLUP with GWAS improves genomic prediction for haploid induction ability of haploid inducers in maize.结合全基因组关联研究（GWAS）的多性状岭回归最佳线性无偏预测（BLUP）方法可提高对玉米单倍体诱导系单倍体诱导能力的基因组预测。

Front Plant Sci. 2025 Aug 19;16:1614457. doi: 10.3389/fpls.2025.1614457. eCollection 2025.

Genomic selection: Essence, applications, and prospects.基因组选择：本质、应用与前景。

Plant Genome. 2025 Jun;18(2):e70053. doi: 10.1002/tpg2.70053.

Impact of different genomic relationship matrix construction methods on the accuracy of genomic prediction in different species.不同基因组关系矩阵构建方法对不同物种基因组预测准确性的影响。

Front Genet. 2025 May 2;16:1576248. doi: 10.3389/fgene.2025.1576248. eCollection 2025.

Genomic selection in a kiwiberry breeding programme: integrating intra- and inter-specific crossing.猕猴桃育种计划中的基因组选择：整合种内和种间杂交

Mol Breed. 2025 Mar 7;45(3):31. doi: 10.1007/s11032-025-01550-8. eCollection 2025 Mar.

Constructing training sets for genomic selection to identify superior genotypes in candidate populations.构建基因组选择的训练集，以在候选群体中识别优良基因型。

Theor Appl Genet. 2024 Nov 17;137(12):270. doi: 10.1007/s00122-024-04766-y.

Genomic estimated selection criteria and parental contributions in parent selection increase genetic gain of maternal haploid inducers in maize.在玉米中，通过基因组估计选择标准和亲本贡献来选择亲本，可以增加母本单倍体诱导剂的遗传增益。

Theor Appl Genet. 2024 Oct 6;137(11):248. doi: 10.1007/s00122-024-04744-4.

Training set optimization is a feasible alternative for perennial orphan crop domestication and germplasm management: an example.训练集优化是多年生孤儿作物驯化和种质管理的一种可行选择：一个实例。

Front Plant Sci. 2024 Sep 10;15:1441683. doi: 10.3389/fpls.2024.1441683. eCollection 2024.

Maximizing efficiency in sunflower breeding through historical data optimization.通过历史数据优化实现向日葵育种效率最大化。

Plant Methods. 2024 Mar 16;20(1):42. doi: 10.1186/s13007-024-01151-0.

A statistical package for evaluation of hybrid performance in plant breeding via genomic selection.一个用于通过基因组选择评估植物育种中杂种性能的统计软件包。

Sci Rep. 2023 Jul 27;13(1):12204. doi: 10.1038/s41598-023-39434-6.

Sample size determination for training set optimization in genomic prediction.基因组预测中训练集优化的样本量确定。

Theor Appl Genet. 2023 Mar 13;136(3):57. doi: 10.1007/s00122-023-04254-9.

本文引用的文献

Design of training populations for selective phenotyping in genomic prediction.用于基因组预测中选择性表型分析的训练群体设计。

Sci Rep. 2019 Feb 5;9(1):1446. doi: 10.1038/s41598-018-38081-6.

Bayesian optimization for genomic selection: a method for discovering the best genotype among a large number of candidates.用于基因组选择的贝叶斯优化：一种在大量候选基因型中发现最佳基因型的方法。

Theor Appl Genet. 2018 Jan;131(1):93-105. doi: 10.1007/s00122-017-2988-z. Epub 2017 Oct 6.

Predicting genomic selection efficiency to optimize calibration set and to assess prediction accuracy in highly structured populations.预测基因组选择效率以优化校准集并评估高度结构化群体中的预测准确性。

Theor Appl Genet. 2017 Nov;130(11):2231-2247. doi: 10.1007/s00122-017-2956-7. Epub 2017 Aug 9.

Walking through the statistical black boxes of plant breeding.穿越植物育种的统计黑箱。

Theor Appl Genet. 2016 Oct;129(10):1933-49. doi: 10.1007/s00122-016-2750-y. Epub 2016 Jul 19.

Optimization of genomic selection training populations with a genetic algorithm.利用遗传算法优化基因组选择训练群体

Genet Sel Evol. 2015 May 6;47(1):38. doi: 10.1186/s12711-015-0116-6.

Training set optimization under population structure in genomic selection.基因组选择中群体结构下的训练集优化

Theor Appl Genet. 2015 Jan;128(1):145-58. doi: 10.1007/s00122-014-2418-4. Epub 2014 Nov 1.

Genome-wide prediction of traits with different genetic architecture through efficient variable selection.通过有效的变量选择对具有不同遗传结构的性状进行全基因组预测。

Genetics. 2013 Oct;195(2):573-87. doi: 10.1534/genetics.113.150078. Epub 2013 Aug 9.

Priors in whole-genome regression: the bayesian alphabet returns.全基因组回归中的先验信息：贝叶斯字母表回归。

Genetics. 2013 Jul;194(3):573-96. doi: 10.1534/genetics.113.151753. Epub 2013 May 1.

Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.).通过优化参考个体的校准集来提高基因组选择的可靠性：两种不同群体的玉米自交系（Zea mays L.）中的方法比较。

Genetics. 2012 Oct;192(2):715-28. doi: 10.1534/genetics.112.141473. Epub 2012 Aug 3.

Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa.全基因组关联作图揭示了水稻复杂性状的丰富遗传结构。

Nat Commun. 2011 Sep 13;2:467. doi: 10.1038/ncomms1467.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组选择的训练集确定。

Training set determination for genomic selection.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献