Pausch Hubert, MacLeod Iona M, Fries Ruedi, Emmerling Reiner, Bowman Phil J, Daetwyler Hans D, Goddard Michael E
Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
Chair of Animal Breeding, Technische Universitaet Muenchen, 85354, Freising, Germany.
Genet Sel Evol. 2017 Feb 21;49(1):24. doi: 10.1186/s12711-017-0301-x.
The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants.
We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes.
The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants.
来自各种来源的密集基因型和全基因组序列变异的可用性提供了机会来汇编由数万个个体组成的大型数据集,这些个体在数百万个多态性位点具有基因型,这可能会增强基因组分析的能力。缺失基因型的插补可确保所有个体都具有一组共享变异的基因型。
我们使用Minimac和FImpute评估了249头弗莱维赫牛和450头荷斯坦牛从密集基因型到全基因组序列变异的插补准确性。将一部分动物的序列变异减少到Illumina BovineHD基因分型阵列上包含的变异,随后使用品种内或多品种参考群体在计算机上进行推断。插补准确性在不同染色体上差异很大,并且在牛基因组包含片段重复的区域会下降。根据插补策略,插补基因型与真实基因型之间的相关性范围为0.898至0.952。Minimac的插补准确性高于FImpute,特别是对于次要等位基因频率较低的变异。使用多品种参考群体可提高插补准确性,特别是在使用FImpute推断基因型时。当使用Minimac插补序列变异时,真实基因型与预测等位基因剂量的相关性比最佳猜测基因型更高。在6958头动物中插补23256743个序列变异时,Minimac的计算成本比FImpute高十倍。对插补序列变异的关联研究揭示了七个乳脂率的数量性状位点(QTL)。当使用Minimac推断基因型时,DGAT1和GHR基因中的两个因果突变是14号和20号染色体上两个QTL中最显著相关的变异。
在大型队列中基于群体对数百万个序列变异进行插补在计算上是可行的,并能提供准确的基因型。然而,在基因组包含大片段重复或阵列衍生单核苷酸多态性覆盖较差的区域,插补准确性较低。使用包含多个品种个体的参考群体可提高插补准确性,特别是在低频变异处。在与插补序列变异的关联研究中,将等位基因剂量而非最佳猜测基因型作为解释变量有利于检测因果突变。