Jattawa Danai, Elzo Mauricio A, Koonawootrittriron Skorn, Suwanasopee Thanathip
Department of Animal Sciences, University of Florida, Gainesville, FL 32611-0910, USA.
Asian-Australas J Anim Sci. 2016 Apr;29(4):464-70. doi: 10.5713/ajas.15.0291. Epub 2016 Apr 1.
The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.
本研究的目的是调查在泰国荷斯坦-其他多品种奶牛群体中,从低密度(LDC)单核苷酸多态性(SNP)芯片到中等密度SNP芯片(MDC)的填充准确性。对来自145个奶牛场、具有完整系谱信息的奶牛(n = 1,244头)使用GeneSeek GGP20K芯片(n = 570头)、GGP26K芯片(n = 540头)和GGP80K芯片(n = 134头)进行基因分型。在检查SNP质量后,使用GGP20K、GGP26K和GGP80K共有的17,779个SNP标记来代表MDC。将动物分为两组,一组为参考组(n = 912头),另一组为测试组(n = 332头)。为测试组选择的SNP标记位于与GeneSeek GGP9K相对应的位置(n = 7,652个)。使用三种不同的软件包进行从LDC到MDC的基因型填充,即Beagle 3.3(基于群体的算法)、FImpute 2.2(基于家族和群体的组合算法)和Findhap 4(基于家族和群体的组合算法)。计算染色体内部和跨染色体的填充准确性,计算方法为正确填充的SNP标记数与总体填充的SNP标记数之比。三种软件包的填充准确性范围为76.79%至93.94%。FImpute的填充准确性(93.94%)高于Findhap(84.64%)和Beagle(76.79%)。FImpute在各染色体上的填充准确性相似且一致,但Findhap和Beagle并非如此。显示高填充准确性(73%)或低填充准确性(80%)的大多数染色体与连锁不平衡(LD;此处定义为染色体上相距小于或等于1 Mb的相邻SNP对之间的相关性)高于或低于平均水平的染色体相同。结果表明,在这个泰国多品种群体中,FImpute比Findhap和Beagle更适合进行基因型填充。也许通过提高系谱信息的完整性可以进一步提高填充准确性。