Suppr超能文献

一种基于集成的方法用于中等密度基因型插补以进行基因组选择并应用于安格斯牛

An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle.

作者信息

Sun Chuanyu, Wu Xiao-Lin, Weigel Kent A, Rosa Guilherme J M, Bauck Stewart, Woodward Brent W, Schnabel Robert D, Taylor Jeremy F, Gianola Daniel

机构信息

Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA.

出版信息

Genet Res (Camb). 2012 Jun;94(3):133-50. doi: 10.1017/S001667231200033X. Epub 2012 Jul 18.

Abstract

Summary Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority 'voting' to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

摘要

摘要 在基因组选择中,利用低密度SNP芯片数据进行中等密度基因型的填补越来越受到关注,因为这可以显著降低基因分型成本。目前已开发出多个填补软件包,但它们的填补准确性各不相同,且不同方法填补出的基因型可能不一致。本文提出了一种类似AdaBoost的方法,将来自多个独立软件包(即Beagle(v3.3)、IMPUTE(v2.0)、fastPHASE(v1.4)、AlphaImpute、findhap(v2)和Fimpute(v2))的填补结果进行整合,每个软件包在基于集成的系统中作为一个基本分类器。基于集成的方法为所有分类器依次计算权重,并通过加权多数“投票”组合各组成方法的结果来确定未知基因型。数据包括3078头登记的安格斯牛,每头牛都用Illumina BovineSNP50芯片进行了基因分型。利用三条染色体(BTA1、BTA16和BTA28)上的SNP基因型比较各方法间的填补准确性,应用部分是基于一组5K基因型对覆盖29条染色体的50K基因型进行填补。在六个填补软件包中,Beagle和Fimpute的准确性最高,范围在0·8677至0·9858之间。所提出的集成方法优于这些软件包中的任何一个,但投票方案中独立分类器的顺序会影响填补准确性。填补准确性最高的集成系统是以Beagle作为第一个分类器,随后是一或两个利用系谱信息的方法。所提出的集成方法的一个显著特点是它可以解决不同填补方法之间的不一致问题,从而相对于独立方法产生一个更可靠的基因型填补系统。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验