He Jun, Guo Yage, Xu Jiaqi, Li Hao, Fuller Anna, Tait Richard G, Wu Xiao-Lin, Bauck Stewart
Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE, USA.
College of Animal Science and Technology, Hunan Agricultural University, Changsha, China.
BMC Genet. 2018 Aug 9;19(1):56. doi: 10.1186/s12863-018-0654-3.
SNPs are informative to estimate genomic breed composition (GBC) of individual animals, but selected SNPs for this purpose were not made available in the commercial bovine SNP chips prior to the present study. The primary objective of the present study was to select five common SNP panels for estimating GBC of individual animals initially involving 10 cattle breeds (two dairy breeds and eight beef breeds). The performance of the five common SNP panels was evaluated based on admixture model and linear regression model, respectively. Finally, the downstream implication of GBC on genomic prediction accuracies was investigated and discussed in a Santa Gertrudis cattle population.
There were 15,708 common SNPs across five currently-available commercial bovine SNP chips. From this set, four subsets (1,000, 3,000, 5,000, and 10,000 SNPs) were selected by maximizing average Euclidean distance (AED) of SNP allelic frequencies among the ten cattle breeds. For 198 animals presented as Akaushi, estimated GBC of the Akaushi breed (GBCA) based on the admixture model agreed very well among the five SNP panels, identifying 166 animals with GBCA = 1. Using the same SNP panels, the linear regression approach reported fewer animals with GBCA = 1. Nevertheless, estimated GBCA using both models were highly correlated (r = 0.953 to 0.992). In the genomic prediction of a Santa Gertrudis population (and crosses), the results showed that the predictability of molecular breeding values using SNP effects obtained from 1,225 animals with no less than 0.90 GBC of Santa Gertrudis (GBCSG) decreased on crossbred animals with lower GBCSG.
Of the two statistical models used to compute GBC, the admixture model gave more consistent results among the five selected SNP panels than the linear regression model. The availability of these common SNP panels facilitates identification and estimation of breed compositions using currently-available bovine SNP chips. In view of utility, the 1 K panel is the most cost effective and it is convenient to be included as add-on content in future development of bovine SNP chips, whereas the 10 K and 16 K SNP panels can be more resourceful if used independently for imputation to intermediate or high-density genotypes.
单核苷酸多态性(SNPs)有助于估计个体动物的基因组品种组成(GBC),但在本研究之前,用于此目的的选定SNPs在商业牛SNP芯片中并未提供。本研究的主要目的是选择五个常见的SNP面板,用于估计最初涉及10个牛品种(两个奶牛品种和八个肉牛品种)的个体动物的GBC。分别基于混合模型和线性回归模型评估了这五个常见SNP面板的性能。最后,在圣格特鲁迪斯牛群体中研究并讨论了GBC对基因组预测准确性的下游影响。
在五个当前可用的商业牛SNP芯片中共有15,708个常见SNP。从这个集合中,通过最大化十个牛品种之间SNP等位基因频率的平均欧几里得距离(AED),选择了四个子集(1000、3000、5000和10000个SNP)。对于198头呈现为赤牛的动物,基于混合模型估计的赤牛品种GBC(GBCA)在五个SNP面板之间非常一致,识别出166头GBCA = 1的动物。使用相同的SNP面板,线性回归方法报告的GBCA = 1的动物较少。然而,使用这两种模型估计的GBCA高度相关(r = 0.953至)。在圣格特鲁迪斯牛群体(及其杂交后代)的基因组预测中,结果表明使用来自1225头圣格特鲁迪斯牛GBC不少于0.90(GBCSG)的动物获得的SNP效应进行分子育种值预测时,对于GBCSG较低的杂交动物,预测能力会下降。
在用于计算GBC的两种统计模型中,混合模型在五个选定的SNP面板之间给出的结果比线性回归模型更一致。这些常见SNP面板的可用性有助于使用当前可用的牛SNP芯片识别和估计品种组成。从实用性来看,1K面板最具成本效益,并且便于在未来牛SNP芯片的开发中作为附加内容包含在内,而10K和16K SNP面板如果独立用于推算到中等或高密度基因型,则可能更具资源优势。