Ye Shaopan, Gao Ning, Zheng Rongrong, Chen Zitao, Teng Jinyan, Yuan Xiaolong, Zhang Hao, Chen Zanmou, Zhang Xiquan, Li Jiaqi, Zhang Zhe
Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China.
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
Front Genet. 2019 Jul 17;10:673. doi: 10.3389/fgene.2019.00673. eCollection 2019.
Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1-3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.
利用推算的全基因组测序(WGS)数据进行基因组预测是一种低成本提高预测能力的有吸引力的方法。然而,在牲畜中使用这种方法尚未实现高精度。在本研究中,我们使用不同的参考面板将600K单核苷酸多态性(SNP)芯片数据中的435个个体推算为WGS数据。我们还研究了使用来自不同参考面板的推算WGS数据、基于连锁不平衡(LD)的标记筛选以及基于全基因组关联协会(GWAS)结果预先选择的变异进行基因组最佳线性无偏预测(GBLUP)的预测准确性。结果表明,对于内部、外部和组合参考面板,从600K到WGS数据的推算准确性分别为0.873±0.038、0.906±0.036和0.979±0.010。在鸡的大多数性状中,从内部参考面板获得的推算WGS数据的预测准确性大于或等于组合参考面板;外部参考面板的预测准确性最低。与600K芯片数据相比,使用推算WGS数据的GBLUP在预测准确性上仅略有提高(1 - 3%)。仅使用基于GWAS结果从推算WGS数据中选择的变异,对于大多数性状几乎没有提高,甚至增加了回归系数的偏差。所选和剩余变异的LD程度对预测准确性的影响不同。对于平均日增重(ADG)、剩余采食量(RFI)、肠长度(IL)和91日龄体重(BW91),GBLUP的准确性随着所选变异LD程度的降低而增加,但对于剩余变异则呈现相反的关系。但对于胸肌重(BMW)和平均日采食量(ADFI),GBLUP的准确性随着所选变异LD程度的增加而增加,剩余变异的LD程度对预测准确性影响较小。总体而言,为进行基因组预测获取WGS数据的最佳推算策略应考虑所选个体与目标群体个体之间的关系,以避免推算的异质性。基于LD的标记筛选可用于提高使用推算WGS数据进行基因组预测的准确性。