基因组选择中群体结构下的训练集优化

Training set optimization under population structure in genomic selection.

作者信息

Isidro Julio, Jannink Jean-Luc, Akdemir Deniz, Poland Jesse, Heslot Nicolas, Sorrells Mark E

机构信息

Cornell University, Ithaca, NY, USA,

出版信息

Theor Appl Genet. 2015 Jan;128(1):145-58. doi: 10.1007/s00122-014-2418-4. Epub 2014 Nov 1.

DOI:10.1007/s00122-014-2418-4

PMID:25367380

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4282691/

Abstract

Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.

摘要

在优化训练集群体之前，必须评估群体结构。使训练集捕获的表型变异最大化对于实现最佳性能很重要。基因组选择中训练集（TRS）的优化在动植物育种中都备受关注，因为它对预测模型的准确性至关重要。在本研究中，评估了五种不同的TRS抽样算法，即分层抽样、决定系数均值（CDmean）、预测误差方差均值（PEVmean）、分层CDmean（StratCDmean）和随机抽样，以考察在不同程度的群体结构存在时的预测准确性。在存在群体结构的情况下，希望抽样方法在TRS中捕获的表型变异最大。小麦数据集显示出温和的群体结构，除容重和抽穗期外，CDmean和分层CDmean方法对所有性状均显示出最高的准确性。水稻数据集具有强烈的群体结构，基于分层抽样的方法对所有性状均显示出最高的准确性。一般来说，CDmean使TRS中基因型之间的关系最小化，使TRS与测试集之间的关系最大化。这使其适合作为长期选择的优化标准。我们的结果表明，用于优化TRS的最佳选择标准似乎取决于性状结构和群体结构的相互作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9c1/4282691/9e98cdf0eba0/122_2014_2418_Fig1_HTML.jpg

相似文献

Training set optimization under population structure in genomic selection.基因组选择中群体结构下的训练集优化

Theor Appl Genet. 2015 Jan;128(1):145-58. doi: 10.1007/s00122-014-2418-4. Epub 2014 Nov 1.

Genomic prediction and training set optimization in a structured Mediterranean oat population.在结构化的地中海燕麦群体中进行基因组预测和训练集优化。

Theor Appl Genet. 2021 Nov;134(11):3595-3609. doi: 10.1007/s00122-021-03916-w. Epub 2021 Aug 3.

Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel.训练群体选择和固定效应的使用，以优化美国历史冬小麦面板的基因组预测。

Theor Appl Genet. 2019 Apr;132(4):1247-1261. doi: 10.1007/s00122-019-03276-6. Epub 2019 Jan 24.

Optimization of training sets for genomic prediction of early-stage single crosses in maize.优化训练集以进行玉米早期单交种的基因组预测。

Theor Appl Genet. 2021 Feb;134(2):687-699. doi: 10.1007/s00122-020-03722-w. Epub 2021 Jan 4.

Training set determination for genomic selection.基因组选择的训练集确定。

Theor Appl Genet. 2019 Oct;132(10):2781-2792. doi: 10.1007/s00122-019-03387-0. Epub 2019 Jul 2.

A comparison of methods for training population optimization in genomic selection.比较基因组选择中群体优化训练方法。

Theor Appl Genet. 2023 Mar 9;136(3):30. doi: 10.1007/s00122-023-04265-6.

Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.).通过优化参考个体的校准集来提高基因组选择的可靠性：两种不同群体的玉米自交系（Zea mays L.）中的方法比较。

Genetics. 2012 Oct;192(2):715-28. doi: 10.1534/genetics.112.141473. Epub 2012 Aug 3.

Training set optimization of genomic prediction by means of EthAcc.通过 EthAcc 对基因组预测进行训练集优化。

PLoS One. 2019 Feb 19;14(2):e0205629. doi: 10.1371/journal.pone.0205629. eCollection 2019.

Optimization of genomic selection training populations with a genetic algorithm.利用遗传算法优化基因组选择训练群体

Genet Sel Evol. 2015 May 6;47(1):38. doi: 10.1186/s12711-015-0116-6.

Optimal Designs for Genomic Selection in Hybrid Crops.杂种作物基因组选择的最优设计。

Mol Plant. 2019 Mar 4;12(3):390-401. doi: 10.1016/j.molp.2018.12.022. Epub 2019 Jan 6.

引用本文的文献

Assessment of genomic prediction capabilities of transcriptome data in a barley multi-parent RIL population.大麦多亲本重组自交系群体中转录组数据的基因组预测能力评估

Theor Appl Genet. 2025 Sep 10;138(10):247. doi: 10.1007/s00122-025-05029-0.

Transferability of genomic prediction models across market segments in potato and the effect of selection.马铃薯基因组预测模型在不同市场细分中的可转移性及选择效应

Theor Appl Genet. 2025 Aug 20;138(9):219. doi: 10.1007/s00122-025-05004-9.

Multi-trait/environment sparse genomic prediction using the SFSI R-package.使用SFSI R包进行多性状/环境稀疏基因组预测。

Plant Genome. 2025 Jun;18(2):e70050. doi: 10.1002/tpg2.70050.

Genomic selection: Essence, applications, and prospects.基因组选择：本质、应用与前景。

Plant Genome. 2025 Jun;18(2):e70053. doi: 10.1002/tpg2.70053.

Genomic prediction in Persian walnut: Optimization levers according to genetic architecture of complex traits.波斯核桃的基因组预测：根据复杂性状的遗传结构确定优化手段。

Plant Genome. 2025 Jun;18(2):e70047. doi: 10.1002/tpg2.70047.

Optimizing genomic prediction for complex traits via investigating multiple factors in switchgrass.通过研究柳枝稷中的多种因素优化复杂性状的基因组预测。

Plant Physiol. 2025 Jul 3;198(3). doi: 10.1093/plphys/kiaf188.

Assessment of genomic prediction capabilities of transcriptome data in a barley multi-parent RIL population.大麦多亲本重组自交系群体中转录组数据的基因组预测能力评估

Res Sq. 2025 Mar 31:rs.3.rs-6145169. doi: 10.21203/rs.3.rs-6145169/v1.

Genomic Prediction for Germplasm Improvement Through Inter-Heterotic-Group Line Crossing in Maize.通过玉米杂种优势群间系谱杂交进行种质改良的基因组预测

Int J Mol Sci. 2025 Mar 15;26(6):2662. doi: 10.3390/ijms26062662.

Optimization of sparse phenotyping strategy in multi-environmental trials in maize.玉米多环境试验中稀疏表型策略的优化

Theor Appl Genet. 2025 Feb 28;138(3):62. doi: 10.1007/s00122-025-04825-y.

Optimizing fully-efficient two-stage models for genomic selection using open-source software.使用开源软件优化用于基因组选择的全效两阶段模型。

Plant Methods. 2025 Feb 4;21(1):9. doi: 10.1186/s13007-024-01318-9.

本文引用的文献

The impact of population structure on genomic prediction in stratified populations.群体结构对分层群体中基因组预测的影响。

Theor Appl Genet. 2014 Mar;127(3):749-62. doi: 10.1007/s00122-013-2255-x. Epub 2014 Jan 24.

Pitfalls of predicting complex traits from SNPs.从单核苷酸多态性预测复杂性状的陷阱。

Nat Rev Genet. 2013 Jul;14(7):507-15. doi: 10.1038/nrg3457.

Priors in whole-genome regression: the bayesian alphabet returns.全基因组回归中的先验信息：贝叶斯字母表回归。

Genetics. 2013 Jul;194(3):573-96. doi: 10.1534/genetics.113.151753. Epub 2013 May 1.

Genomic predictability of interconnected biparental maize populations.玉米双亲亲本群体的基因组可预测性。

Genetics. 2013 Jun;194(2):493-503. doi: 10.1534/genetics.113.150227. Epub 2013 Mar 27.

Comparison of selective genotyping strategies for prediction of breeding values in a population undergoing selection.比较选择育种值预测中不同的基因型选择策略在一个处于选择压力下的群体。

J Anim Sci. 2012 Dec;90(13):4716-22. doi: 10.2527/jas.2012-4857.

Genotyping strategies for genomic selection in small dairy cattle populations.小奶牛群体基因组选择的基因分型策略。

Animal. 2012 Aug;6(8):1216-24. doi: 10.1017/S1751731112000341.

Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments.基因组预测在不同育种群体和环境下对玉米杂交种表现的有效性。

G3 (Bethesda). 2012 Nov;2(11):1427-36. doi: 10.1534/g3.112.003699. Epub 2012 Nov 1.

Shrinkage estimation of the realized relationship matrix.实现关系矩阵的收缩估计。

G3 (Bethesda). 2012 Nov;2(11):1405-13. doi: 10.1534/g3.112.004259. Epub 2012 Nov 1.

Genomewide predictions from maize single-cross data.基于玉米单交数据的全基因组预测。

Theor Appl Genet. 2013 Jan;126(1):13-22. doi: 10.1007/s00122-012-1955-y. Epub 2012 Aug 11.

Genetics. 2012 Oct;192(2):715-28. doi: 10.1534/genetics.112.141473. Epub 2012 Aug 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组选择中群体结构下的训练集优化

Training set optimization under population structure in genomic selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献