Suppr超能文献

基因组选择中群体结构下的训练集优化

Training set optimization under population structure in genomic selection.

作者信息

Isidro Julio, Jannink Jean-Luc, Akdemir Deniz, Poland Jesse, Heslot Nicolas, Sorrells Mark E

机构信息

Cornell University, Ithaca, NY, USA,

出版信息

Theor Appl Genet. 2015 Jan;128(1):145-58. doi: 10.1007/s00122-014-2418-4. Epub 2014 Nov 1.

Abstract

Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.

摘要

在优化训练集群体之前,必须评估群体结构。使训练集捕获的表型变异最大化对于实现最佳性能很重要。基因组选择中训练集(TRS)的优化在动植物育种中都备受关注,因为它对预测模型的准确性至关重要。在本研究中,评估了五种不同的TRS抽样算法,即分层抽样、决定系数均值(CDmean)、预测误差方差均值(PEVmean)、分层CDmean(StratCDmean)和随机抽样,以考察在不同程度的群体结构存在时的预测准确性。在存在群体结构的情况下,希望抽样方法在TRS中捕获的表型变异最大。小麦数据集显示出温和的群体结构,除容重和抽穗期外,CDmean和分层CDmean方法对所有性状均显示出最高的准确性。水稻数据集具有强烈的群体结构,基于分层抽样的方法对所有性状均显示出最高的准确性。一般来说,CDmean使TRS中基因型之间的关系最小化,使TRS与测试集之间的关系最大化。这使其适合作为长期选择的优化标准。我们的结果表明,用于优化TRS的最佳选择标准似乎取决于性状结构和群体结构的相互作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9c1/4282691/9e98cdf0eba0/122_2014_2418_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验