Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
Genet Sel Evol. 2022 Jun 27;54(1):47. doi: 10.1186/s12711-022-00737-3.
Compared to medium-density single nucleotide polymorphism (SNP) data, high-density SNP data contain abundant genetic variants and provide more information for the genetic evaluation of livestock, but it has been shown that they do not confer any advantage for genomic prediction and heritability estimation. One possible reason is the uneven distribution of the linkage disequilibrium (LD) along the genome, i.e., LD heterogeneity among regions. The aim of this study was to effectively use genome-wide SNP data for genomic prediction and heritability estimation by using models that control LD heterogeneity among regions.
The LD-adjusted kinship (LDAK) and LD-stratified multicomponent (LDS) models were used to control LD heterogeneity among regions and were compared with the classical model that has no such control. Simulated and real traits of 2000 dairy cattle individuals with imputed high-density (770K) SNP data were used. Five types of phenotypes were simulated, which were controlled by very strongly, strongly, moderately, weakly and very weakly tagged causal variants, respectively. The performances of the models with high- and medium-density (50K) panels were compared to verify that the models that controlled LD heterogeneity among regions were more effective with high-density data.
Compared to the medium-density panel, the use of the high-density panel did not improve and even decreased prediction accuracies and heritability estimates from the classical model for both simulated and real traits. Compared to the classical model, LDS effectively improved the accuracy of genomic predictions and unbiasedness of heritability estimates, regardless of the genetic architecture of the trait. LDAK applies only to traits that are mainly controlled by weakly tagged causal variants, but is still less effective than LDS for this type of trait. Compared with the classical model, LDS improved prediction accuracy by about 13% for simulated phenotypes and by 0.3 to ~ 10.7% for real traits with the high-density panel, and by ~ 1% for simulated phenotypes and by - 0.1 to ~ 6.9% for real traits with the medium-density panel.
Grouping SNPs based on regional LD to construct the LD-stratified multicomponent model can effectively eliminate the adverse effects of LD heterogeneity among regions, and greatly improve the efficiency of high-density SNP data for genomic prediction and heritability estimation.
与中密度单核苷酸多态性(SNP)数据相比,高密度 SNP 数据包含丰富的遗传变异,可为家畜的遗传评估提供更多信息,但已表明它们对基因组预测和遗传力估计没有任何优势。一个可能的原因是连锁不平衡(LD)沿基因组的分布不均,即区域间 LD 异质性。本研究的目的是通过使用控制区域间 LD 异质性的模型,有效利用全基因组 SNP 数据进行基因组预测和遗传力估计。
使用调整连锁不平衡(LDAK)和分层多成分(LDS)模型来控制区域间 LD 异质性,并将其与没有这种控制的经典模型进行比较。使用 2000 头具有高分辨率(770K)SNP 数据的奶牛个体的模拟和真实性状进行了研究。模拟了五种类型的表型,分别由非常强、强、中、弱和非常弱标记的因果变异控制。比较了高、中密度(50K)面板模型的性能,以验证控制区域间 LD 异质性的模型在高密度数据下更为有效。
与中密度面板相比,使用高密度面板不仅没有提高,甚至降低了经典模型对模拟和真实性状的预测准确性和遗传力估计值。与经典模型相比,LDS 有效地提高了基因组预测的准确性和遗传力估计的无偏性,而与性状的遗传结构无关。LDAK 仅适用于主要由弱标记因果变异控制的性状,但对于这种类型的性状,其效果仍不如 LDS。与经典模型相比,LDS 提高了模拟表型的预测准确性约 13%,并提高了高密度面板真实性状的预测准确性约 0.3%至 10.7%,提高了中密度面板模拟表型的预测准确性约 1%,并提高了中密度面板真实性状的预测准确性约-0.1%至 6.9%。
基于区域 LD 对 SNP 进行分组以构建分层多成分模型,可以有效消除区域间 LD 异质性的不利影响,大大提高高密度 SNP 数据在基因组预测和遗传力估计中的效率。