使用遗传标记、推断的祖先单倍型和基因组关系矩阵进行基因组育种值估计。

Genomic breeding value estimation using genetic markers, inferred ancestral haplotypes, and the genomic relationship matrix.

机构信息

CRV, PO Box 454, 6800 AL Arnhem, the Netherlands.

出版信息

J Dairy Sci. 2011 Sep;94(9):4708-14. doi: 10.3168/jds.2010-3905.

DOI:10.3168/jds.2010-3905

Abstract

With the introduction of new single nucleotide polymorphism (SNP) chips of various densities, more and more genotype data sets will include animals genotyped for only a subset of the SNP. Imputation techniques based on unobserved ancestral haplotypes may be used to infer missing genotypes. These ancestral haplotypes may also be used in the genomic prediction model, instead of using the SNP. This may increase the reliability of predictions because the ancestral haplotype may capture more linkage disequilibrium with quantitative trait loci than SNP. The aim of this paper was to study whether using unobserved ancestral haplotypes in a genomic prediction model would provide more reliable genomic predictions than using SNP, and to determine how many loci in the genomic prediction model would be redundant. Genotypes of 8,960 bulls and cows for 39,557 SNP were analyzed with a hidden Markov model to associate each individual at each locus to 2 ancestral haplotypes. The number of ancestral haplotypes per locus was fixed at 10, 15, or 20. Subsequently, a validation study was performed in which the phenotypes of 3,251 progeny-tested bulls for 16 traits were used in a genomic prediction model to predict the estimated breeding values of at least 753 validation bulls. The squared correlation between genomic prediction and deregressed daughter performance estimated breeding value, when averaged across traits, was slightly higher when 15 or 20 ancestral haplotypes per locus were used in the prediction model instead of the SNP genotypes, whereas the prediction model using a genomic relationship matrix gave the lowest squared correlations. The number of redundant loci [i.e., loci that had less than 18 jumps (0.1%) from one ancestral haplotype to another ancestral haplotype at the next locus], was 18,793 (48%), which means that only 20,764 loci would need to be included in the genomic prediction model. This provides opportunities for greatly decreasing computer requirements of genomic evaluations with very large numbers of markers.

摘要

随着各种密度的新单核苷酸多态性 (SNP) 芯片的引入，越来越多的基因型数据集将包括仅对 SNP 的一部分进行基因分型的动物。基于未观察到的祖先单倍型的插补技术可用于推断缺失的基因型。这些祖先单倍型也可用于基因组预测模型，而不是使用 SNP。这可能会提高预测的可靠性，因为与 SNP 相比，祖先单倍型可能与数量性状基因座具有更多的连锁不平衡。本文旨在研究在基因组预测模型中使用未观察到的祖先单倍型是否会比使用 SNP 提供更可靠的基因组预测，并确定基因组预测模型中会有多少个基因座是冗余的。使用隐马尔可夫模型对 8960 头公牛和母牛的 39557 个 SNP 基因型进行分析，以将每个个体与每个基因座的 2 个祖先单倍型相关联。每个基因座的祖先单倍型数量固定为 10、15 或 20。随后，进行了验证研究，其中使用 16 个性状的 3251 头后裔测试公牛的表型在基因组预测模型中预测至少 753 头验证公牛的估计育种值。平均跨性状，当在预测模型中使用 15 或 20 个祖先单倍型而不是 SNP 基因型时，基因组预测与去回归女儿性能估计育种值之间的平方相关度略高，而使用基因组关系矩阵的预测模型给出的平方相关度最低。冗余基因座的数量[即，在下一个基因座处从一个祖先单倍型到另一个祖先单倍型的跳跃次数（0.1%）少于 18 次]为 18793 个（48%），这意味着基因组预测模型中仅需包含 20764 个基因座。这为使用大量标记大大降低基因组评估的计算机要求提供了机会。