Heidaritabar Marzieh, Calus Mario P L, Vereijken Addie, Groenen Martien A M, Bastiaansen John W M
Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
BMC Genet. 2015 Aug 18;16:101. doi: 10.1186/s12863-015-0253-5.
Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed.
The accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref(22) and Ref(62)). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref(22) and Ref(62) respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref(22) was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R(2) measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small.
Even with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.
基因型填充已成为现代遗传研究中的一种标准做法,以增加基因组覆盖范围并提高基因组选择(GS)和全基因组关联研究(GWAS)的准确性。我们使用2140只白来航鸡群体中一小部分最常见的父本,评估了从低密度单核苷酸多态性(SNP)面板推断60K基因型数据的准确性。研究了几个影响填充准确性的因素,包括参考群体的大小、参考群体与验证群体之间的亲缘关系水平以及被推断SNP的次要等位基因频率(MAF)。
使用22只和62只精心挑选的参考动物(Ref(22)和Ref(62)),在不同情况下评估了填充准确性。在大多数情况下,校正基因含量后的个体特异性填充准确性平均适中(约为0.80),在3K至60K情况下较低。对于Ref(22)和Ref(62)最有利的情况,当SNP独立于其MAF被屏蔽时,最大平均准确性分别为0.90和0.93。MAF较低的SNP更难推断,较大的参考群体显著提高了这些稀有SNP的填充准确性。当使用Ref(22)进行推断时,当验证群体与参考群体相隔两代而不是一代时,平均推断准确性下降0.04,当相隔三代时又增加0.05。与从群体中随机选择动物相比,从最常见的父本中选择参考动物,显著提高了低MAF SNP的填充准确性,但对其他MAF类别仅带来有限的提高。当验证面板的密度非常低(3K)且SNP的MAF和参考群体的大小不是极小的时候,发现来自Beagle软件的等位基因R(2)测量值是填充可靠性的良好预测指标(相关性约为0.8)。
即使参考群体中的动物数量非常少,也可以实现合理的填充准确性。选择一组最常见的父本,而不是为参考群体随机选择动物,可提高稀有等位基因的填充准确性,这在用全基因组重测序数据进行填充时可能是一个优势。