Korkuć Paula, Arends Danny, Brockmann Gudrun A
Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt University of Berlin, Berlin, Germany.
Front Genet. 2019 Feb 18;10:52. doi: 10.3389/fgene.2019.00052. eCollection 2019.
The imputation from lower density SNP chip genotypes to whole-genome sequence level is an established approach to generate high density genotypes for many individuals. Imputation accuracy is dependent on many factors and for small cattle populations such as the endangered German Black Pied cattle (DSN), determining the optimal imputation strategy is especially challenging since only a low number of high density genotypes is available. In this paper, the accuracy of imputation was explored with regard to (1) phasing of the target population and the reference panel for imputation, (2) comparison of a 1-step imputation approach, where 50 k genotypes are directly imputed to sequence level, to a 2-step imputation approach that used an intermediate step imputing first to 700 k and subsequently to sequence level, (3) the software tools Beagle and Minimac, and (4) the size and composition of the reference panel for imputation. Analyses were performed for 30 DSN and 30 Holstein Frisian cattle available from the 1000 Bull Genomes Project. Imputation accuracy was assessed using a leave-one-out cross validation procedure. We observed that phasing of the target populations and the reference panels affects the imputation accuracy significantly. Minimac reached higher accuracy when imputing using small reference panels, while Beagle performed better with larger reference panels. In contrast to previous research, we found that when a low number of animals is available at the intermediate imputation step, the 1-step imputation approach yielded higher imputation accuracy compared to a 2-step imputation. Overall, the size of the reference panel for imputation is the most important factor leading to higher imputation accuracy, although using a larger reference panel consisting of a related but different breed (Holstein Frisian) significantly reduced imputation accuracy. Our findings provide specific recommendations for populations with a limited number of high density genotyped or sequenced animals available such as DSN. The overall recommendation when imputing a small population are to (1) use a large reference panel of the same breed, (2) use a large reference panel consisting of diverse breeds, or (3) when a large reference panel is not available, we recommend using a smaller same breed reference panel without including a different related breed.
将低密度单核苷酸多态性(SNP)芯片基因型推算至全基因组序列水平是为许多个体生成高密度基因型的既定方法。推算准确性取决于多种因素,对于像濒危的德国黑花斑牛(DSN)这样的小型牛群而言,确定最佳推算策略尤其具有挑战性,因为仅有少量的高密度基因型可用。在本文中,针对以下方面探讨了推算准确性:(1)目标群体和用于推算的参考面板的定相;(2)将50k基因型直接推算至序列水平的一步法推算方法与先推算至700k随后再推算至序列水平的两步法推算方法进行比较;(3)软件工具Beagle和Minimac;(4)用于推算的参考面板的大小和组成。对来自1000公牛基因组计划的30头DSN牛和30头荷斯坦弗里生牛进行了分析。使用留一法交叉验证程序评估推算准确性。我们观察到目标群体和参考面板的定相会显著影响推算准确性。在使用小参考面板进行推算时,Minimac达到了更高的准确性,而Beagle在使用大参考面板时表现更佳。与先前的研究相反,我们发现当在中间推算步骤中可用动物数量较少时,与两步法推算相比,一步法推算产生了更高的推算准确性。总体而言,用于推算的参考面板的大小是导致更高推算准确性的最重要因素,尽管使用由相关但不同品种(荷斯坦弗里生牛)组成的更大参考面板会显著降低推算准确性。我们的研究结果为像DSN这样仅有少量高密度基因分型或测序动物的群体提供了具体建议。推算小群体时的总体建议是:(1)使用同一品种的大参考面板;(2)使用由不同品种组成的大参考面板;或者(3)当无法获得大参考面板时,我们建议使用较小的同一品种参考面板,不包括不同的相关品种。