Bovine Functional Genomics Laboratory, ANRI, USDA-ARS, BARC-East, Beltsville, MD 20705, USA.
BMC Genomics. 2012 Aug 6;13:376. doi: 10.1186/1471-2164-13-376.
Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases.
In this study using the high density BovineHD SNP array, we performed high resolution CNV analyses on both Btau_4.0 and UMD3.1 with 674 animals of 27 cattle breeds. We first compared CNV results derived from these two different SNP array platforms on Btau_4.0. With two thirds of the animals shared between studies, on Btau_4.0 we identified 3,346 candidate CNV regions representing 142.7 megabases (~4.70%) of the genome. With a similar total length but 5 times more event counts, the average CNVR length of current Btau_4.0 dataset is significantly shorter than the previous one (42.7 kb vs. 205 kb). Although subsets of these two results overlapped, 64% (91.6 megabases) of current dataset was not present in the previous study. We also performed similar analyses on UMD3.1 using these BovineHD SNP array results. Approximately 50% more and 20% longer CNVs were called on UMD3.1 as compared to those on Btau_4.0. However, a comparable result of CNVRs (3,438 regions with a total length 146.9 megabases) was obtained. We suspect that these results are due to the UMD3.1 assembly's efforts of placing unplaced contigs and removing unmerged alleles. Selected CNVs were further experimentally validated, achieving a 73% PCR validation rate, which is considerably higher than the previous validation rate. About 20-45% of CNV regions overlapped with cattle RefSeq genes and Ensembl genes. Panther and IPA analyses indicated that these genes provide a wide spectrum of biological processes involving immune system, lipid metabolism, cell, organism and system development.
We present a comprehensive result of cattle CNVs at a higher resolution and sensitivity. We identified over 3,000 candidate CNV regions on both Btau_4.0 and UMD3.1, further compared current datasets with previous results, and examined the impacts of genome assemblies on CNV calling.
Btau_4.0 和 UMD3.1 是两个不同的牛参考基因组组装。在我们之前使用低密度 BovineSNP50 阵列的研究中,我们报告了对 Btau_4.0 的拷贝数变异 (CNV) 分析,该分析使用了 21 个牛品种的 521 只动物,产生了 682 个总长度为 139.8 兆碱基的 CNV 区域。
在本研究中,我们使用高密度 BovineHD SNP 阵列,对 27 个牛品种的 674 只动物的 Btau_4.0 和 UMD3.1 进行了高分辨率 CNV 分析。我们首先比较了这两个不同 SNP 阵列平台在 Btau_4.0 上的 CNV 结果。在研究中,有三分之二的动物是共享的,我们在 Btau_4.0 上鉴定出 3346 个候选 CNV 区域,代表基因组的 142.7 兆碱基(~4.70%)。虽然这两个结果的子集有重叠,但当前 Btau_4.0 数据集的平均 CNVR 长度明显短于之前的数据集(42.7 kb 对 205 kb)。尽管当前数据集的 64%(91.6 兆碱基)不存在于之前的研究中,但当前数据集的子集与之前的研究有重叠。我们还使用这些 BovineHD SNP 阵列结果对 UMD3.1 进行了类似的分析。与 Btau_4.0 相比,UMD3.1 上调用的 CNV 数量增加了约 50%,长度增加了 20%。然而,获得了具有相同长度的 CNVRs(3438 个区域,总长度为 146.9 兆碱基)。我们怀疑这些结果是由于 UMD3.1 组装努力放置未定位的连续体和去除未合并的等位基因所致。选择的 CNV 进一步进行了实验验证,获得了 73%的 PCR 验证率,这明显高于之前的验证率。大约 20-45%的 CNV 区域与牛 RefSeq 基因和 Ensembl 基因重叠。Panther 和 IPA 分析表明,这些基因提供了广泛的涉及免疫系统、脂质代谢、细胞、生物体和系统发育的生物学过程。
我们以更高的分辨率和灵敏度呈现了牛 CNV 的综合结果。我们在 Btau_4.0 和 UMD3.1 上都鉴定出了超过 3000 个候选 CNV 区域,进一步比较了当前数据集与之前的结果,并研究了基因组组装对 CNV 调用的影响。