Suppr超能文献

无序标记的推断及其对基因组选择准确性的影响。

Imputation of unordered markers and the impact on genomic selection accuracy.

机构信息

Department of Plant Breeding and Genetics, Cornell University, Ithaca New York 14853, USA.

出版信息

G3 (Bethesda). 2013 Mar;3(3):427-39. doi: 10.1534/g3.112.005363. Epub 2013 Mar 1.

Abstract

Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.

摘要

基因组选择是一种有望加速遗传增益速度的育种方法,它需要密集的、全基因组标记数据。测序基因分型可以产生大量的从头标记。然而,没有参考基因组,这些标记是无序的,通常有很大比例的缺失数据。由于标记估计算法是为有参考基因组的物种开发的,因此尚未对适合无序标记的算法进行严格评估。我们使用四个经验数据集,从估计准确性和影响准确性的因素两个方面,评估并描述了四种这样的估计方法,分别称为 k-最近邻、奇异值分解、随机森林回归和期望最大化估计,这些方法是针对无序标记的。与均值估计相比,评估了估计方法对基因组选择准确性的影响。还检查了排除具有大量缺失数据的标记对基因组选择准确性的影响。我们的结果表明,无序标记的估计可以是准确的,尤其是当标记之间的连锁不平衡较高且基因型个体相关时。在所评估的方法中,随机森林回归估计产生了较高的准确性。与均值估计相比,当缺失数据水平较高时,我们评估的所有四种估计方法都导致了更高的基因组选择准确性。我们得出结论,即使不知道标记顺序,在密集标记集中存在高水平的缺失数据也不是基因组选择的主要障碍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ae0/3583451/825d5de15180/427f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验