Suppr超能文献

通过有效的变量选择对具有不同遗传结构的性状进行全基因组预测。

Genome-wide prediction of traits with different genetic architecture through efficient variable selection.

机构信息

Plant Breeding, Technische Universität München, 85354 Freising, Germany.

出版信息

Genetics. 2013 Oct;195(2):573-87. doi: 10.1534/genetics.113.150078. Epub 2013 Aug 9.

Abstract

In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.

摘要

在基于基因组的预测中,对于需要最大化预测准确性的统计模型和方法存在相当大的不确定性。对于受少数数量性状基因座(QTL)影响的性状,与在基因组中分布效应的方法(例如贝叶斯 B 或最小绝对收缩和选择算子(LASSO))相比,预测有望受益于执行变量选择的方法[岭回归最佳线性无偏预测(RR-BLUP)]。我们通过将计算机模拟与来自水稻(Oryza sativa L.)、小麦(Triticum aestivum L.)和拟南芥(Arabidopsis thaliana L.)的大规模实验数据集相结合,研究了成功进行变量选择的假设。我们证明,当表型个体的数量远远大于导致性状的因果突变数量时,变量选择可以成功。我们表明,随着性状遗传力的降低和连锁不平衡(LD)程度的增加,有效变量选择所需的样本量会急剧增加。我们对比和讨论了模拟和实验研究在变量选择方法优于 RR-BLUP 方面的矛盾结果。我们的结果表明,由于长程 LD、中等遗传力和小样本量,即使对于拟南芥 FRIGIDA 基因表达和水稻开花时间等假定受少数几个主要 QTL 影响的性状,也不能期望变量选择方法在植物育种群体中具有优越性。我们将我们的结论扩展到全基因组序列数据的分析,并推断出 LASSO 可以识别的因果突变数量的上限。我们的结果对选择统计方法以对复杂性状的遗传结构和预测准确性进行可信推断产生重大影响。

相似文献

7
Genomic BLUP decoded: a look into the black box of genomic prediction.基因组 BLUP 解码:探索基因组预测的黑箱。
Genetics. 2013 Jul;194(3):597-607. doi: 10.1534/genetics.113.152207. Epub 2013 May 2.

引用本文的文献

本文引用的文献

1
Linkage disequilibrium in finite populations.有限群体中的连锁不平衡。
Theor Appl Genet. 1968 Jun;38(6):226-31. doi: 10.1007/BF01245622.
10
Extension of the bayesian alphabet for genomic selection.贝叶斯字母在基因组选择中的扩展。
BMC Bioinformatics. 2011 May 23;12:186. doi: 10.1186/1471-2105-12-186.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验