Department of Animal Science, Michigan State University, East Lansing, MI, USA.
BMC Genet. 2013 Feb 21;14:8. doi: 10.1186/1471-2156-14-8.
Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.
TagSNP density of at least 1 tagSNP per 340kb (~7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end).
The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population.
基因型推断是一种经济有效的替代方法,可用于实施基因组选择,而无需使用高密度基因型。本研究的目的是调查影响猪低密度标签 SNP(标签 SNP 之间的平均距离为 100kb 到 1Mb)集推断准确性的变量,这些 SNP 是使用 LD 信息、物理位置或基因型推断准确性选择的。我们比较了基于不同密度和使用三种不同方法选择的几组低密度标签 SNP 的推断准确性结果。此外,我们评估了用于推断的单倍型参考面板的大小和组成变化的影响。
使用成对 LD 信息选择至少每 340kb 有 1 个标签 SNP(约 7000 个标签 SNP)的标签 SNP 密度对于实现平均推断准确性高于 0.95 是必要的。本研究同时开发了一种商业性的猪低密度(9K)标签 SNP 集,基于这些标签 SNP 估计的平均推断准确性为 0.951。当这些单倍型是从随机抽样个体中获得时,构建单倍型参考面板最有效。增加原始参考单倍型面板的大小(从以前研究中随机采样的 32 个 sire/dam/offspring 三交中采样的 128 个单倍型)会导致推断准确性总体提高(IA = 0.97,有 512 个单倍型),但对于 MAF 低于 0.1 的 SNP 和位于染色体极端的 SNP(位于染色体末端的 5%以内)的推断准确性提高尤其有用。
新的商业上可用的 9K 标签 SNP 集可用于获得高精度的推断基因型,即使推断是基于相对较小的参考单倍型面板(128 个单倍型)。通过向参考面板添加单倍型可以进一步提高平均推断准确性。此外,我们的结果表明,与专门采样年龄较大的动物或没有观察到遗传准确性损失的三交个体相比,随机采样个体进行参考单倍型面板的基因分型在成本效益上更具优势。我们预计,基于在奶牛中观察到的准确性和报告结果,在猪育种中使用推断基因型将产生高度准确的 GEBV 预测,因为一些个体的基因组评估是基于与我们约克夏群体相同准确性的基因型推断。