Animal & Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland.
UCD School of Mathematics and Statistics, University College Dublin, Belfield, Dublin 4, Ireland.
BMC Genomics. 2020 Mar 4;21(1):205. doi: 10.1186/s12864-020-6627-8.
The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide polymorphism (SNP) haplotype structure in cattle. While this objective was achieved using high-density genotype panels (i.e., 713,162 SNPs), a secondary objective investigated the concordance of CNVs called with this high-density genotype panel compared to CNVs called from a medium-density panel (i.e., 45,677 SNPs in the present study). This is the first study to compare CNVs called from high-density and medium-density SNP genotypes from the same animals. High (and medium-density) genotypes were available on 991 Holstein-Friesian, 1015 Charolais, and 1394 Limousin bulls. The concordance between CNVs called from the medium-density and high-density genotypes were calculated separately for each animal. A subset of CNVs which were called from the high-density genotypes was selected for imputation. Imputation was carried out separately for each breed using a set of high-density SNPs flanking the midpoint of each CNV. A CNV was deemed to be imputed correctly when the called copy number matched the imputed copy number.
For 97.0% of CNVs called from the high-density genotypes, the corresponding genomic position on the medium-density of the animal did not contain a called CNV. The average accuracy of imputation for CNV deletions was 0.281, with a standard deviation of 0.286. The average accuracy of imputation of the CNV normal state, i.e. the absence of a CNV, was 0.982 with a standard deviation of 0.022. Two CNV duplications were imputed in the Charolais, a single CNV duplication in the Limousins, and a single CNV duplication in the Holstein-Friesians; in all cases the CNV duplications were incorrectly imputed.
The vast majority of CNVs called from the high-density genotypes were not detected using the medium-density genotypes. Furthermore, CNVs cannot be accurately predicted from flanking SNP haplotypes, at least based on the imputation algorithms routinely used in cattle, and using the SNPs currently available on the high-density genotype panel.
个体动物基因型信息的交易通常只涉及所调用基因型的交换,而不一定涉及有效调用结构变体所需的附加信息。这里的主要目的是确定是否可以使用牛的侧翼单核苷酸多态性 (SNP) 单倍型结构来推断拷贝数变异 (CNV)。虽然这一目标是使用高密度基因型面板(即 713162 个 SNP)实现的,但次要目标是研究与使用高密度基因型面板调用的 CNV 相比,从中密度面板(即本研究中的 45677 个 SNP)调用的 CNV 的一致性。这是第一项比较来自同一动物的高密度和中密度 SNP 基因型调用的 CNV 的研究。在 991 头荷斯坦-弗里森、1015 头夏洛来和 1394 头利木赞公牛中,高(中密度)基因型均可用。为每个动物分别计算从中密度和高密度基因型调用的 CNV 之间的一致性。从中高密度基因型调用的 CNV 的子集被选中进行推断。使用一组位于每个 CNV 中点侧翼的高密度 SNP 分别对每个品种进行推断。当调用的拷贝数与推断的拷贝数匹配时,CNV 被认为是正确推断的。
对于 97.0%从中高密度基因型调用的 CNV,动物中等密度上相应的基因组位置不包含已调用的 CNV。CNV 缺失的平均推断准确性为 0.281,标准差为 0.286。CNV 正常状态(即不存在 CNV)的平均推断准确性为 0.982,标准差为 0.022。夏洛来牛中有两个 CNV 重复,利木赞牛中有一个 CNV 重复,荷斯坦-弗里森牛中有一个 CNV 重复;在所有情况下,CNV 重复都被错误地推断。
从中高密度基因型调用的绝大多数 CNV 都无法使用中等密度基因型检测到。此外,至少基于牛中常用的推断算法以及当前在高密度基因型面板上可用的 SNP,无法从侧翼 SNP 单倍型准确预测 CNV。