University of Natural Resources and Life Sciences (BOKU), Vienna, Austria.
AgroPartners Consulting, R. Floriano Peixoto, 120 - Sala 43A - Centro, Araçatuba, SP, 16010-220, Brazil.
Genet Sel Evol. 2021 Dec 18;53(1):96. doi: 10.1186/s12711-021-00688-1.
Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus-B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus-B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references.
The alignment of B. taurus individuals against UOA_Brahman_1 detected up to five million more single-nucleotide variants (SNVs) compared to that against ARS_UCD1.2. Similarly, the alignment of B. indicus individuals against ARS_UCD1.2 resulted in one and a half million more SNVs than that against UOA_Brahman_1. The number of SNVs with nearly fixed alternative alleles also increased in the alignments with cross-subspecies. Interestingly, the alignment of B. taurus cattle against UOA_Brahman_1 revealed regions with a smaller than expected number of counts of SNVs with nearly fixed alternative alleles. Since B. taurus introgression represents on average 10% of the genome of Brahman cattle, we suggest that these regions comprise taurine DNA as opposed to indicine DNA in the UOA_Brahman_1 reference genome. Principal component and admixture analyses using genotypes inferred from this region support these taurine-introgressed loci. Overall, the flagged taurine segments represent 13.7% of the UOA_Brahman_1 assembly. The genes located within these segments were previously reported to be under positive selection in Brahman cattle, and include functional candidate genes implicated in feed efficiency, development and immunity.
We report a list of taurine segments that are in the UOA_Brahman_1 assembly, which will be useful for the interpretation of interesting genomic features (e.g., signatures of selection, runs of homozygosity, increased mutation rate, etc.) that could appear in future re-sequencing analysis of indicine cattle.
参考基因组在基因组数据分析中至关重要。随着测序成本的降低,同一物种内产生了多个参考基因组,以缓解与对不同样本与单个参考个体的比对相关的映射准确性低和参考等位基因偏差等问题。目前科学界用于分析牛数据的最新参考序列是 ARS_UCD1.2,它是由一头海福特牛(Bos taurus taurus-B. taurus)的 DNA 构建而成的。最近,为了进一步支持对 B. indicus 数据的分析,基于婆罗门牛单倍型构建了一个补充基因组组装 UOA_Brahman_1,以代表其他牛亚种(Bos taurus indicus-B. indicus)。在这项研究中,我们将 15 个 B. taurus 和 B. indicus 品种的序列数据与这两个参考基因组进行了比对。
与 ARS_UCD1.2 相比,B. taurus 个体与 UOA_Brahman_1 的比对检测到多达五百万个单核苷酸变异(SNVs)。同样,B. indicus 个体与 ARS_UCD1.2 的比对产生的 SNVs 比与 UOA_Brahman_1 的比对多了 150 万。具有近乎固定替代等位基因的 SNVs 的数量在跨亚种的比对中也增加了。有趣的是,B. taurus 牛与 UOA_Brahman_1 的比对揭示了具有比预期数量少的具有近乎固定替代等位基因的 SNVs 的区域。由于 B. taurus 的渐渗代表婆罗门牛基因组的平均 10%,我们推测这些区域包含 UOA_Brahman_1 参考基因组中的牛 DNA,而不是牛 DNA。基于该区域推断的基因型的主成分和混合分析支持这些牛渐渗的基因座。总体而言,标记的牛 DNA 片段占 UOA_Brahman_1 组装的 13.7%。位于这些片段内的基因之前在婆罗门牛中被报道受到正选择,包括与饲料效率、发育和免疫相关的功能候选基因。
我们报告了 UOA_Brahman_1 组装中牛 DNA 片段的列表,这对于解释未来对牛的重测序分析中可能出现的有趣基因组特征(例如选择的特征、纯合子区域、突变率增加等)将非常有用。