USDA-ARS, Beltsville, MD, 20705-2350 , Animal Genomics and Improvement Laboratory, USDA-ARS, 10300 Baltimore Ave, Beltsville, MD 20705-2350, USA.
Dairy Forage Research Center, USDA-ARS, 1925 Linden Drive, Madison, WI, 53706, USA.
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa021.
Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies.
We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use.
We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.
在过去的 10-12 年中,随着基因组工具的引入,牛的选择进展取得了重大进展。这些工具依赖于牛的参考基因组(UMD3.1.1),该基因组是使用现已过时的技术创建的,存在各种缺陷和不准确之处。
我们提出了新的牛参考基因组 ARS-UCD1.2,它基于与原始基因组相同的动物,以促进从早期版本获得的结果的转移和解释,但应用了组合的现代技术进行从头组装,以提高连续性、准确性和完整性。该组装包括 27 亿碱基,比原始组装的连续性高 250 倍以上,其 contig N50>25 Mb,L50 为 32。我们还大大扩展了基于 RNA 的注释支持数据,共鉴定出 30396 个总基因(21039 个蛋白编码基因)。新的参考组装以注释形式供公众使用。
我们证明了组装序列连续性的提高证明了采用 ARS-UCD1.2 作为新的牛参考基因组是合理的,并且组装准确性的提高将有益于该物种的未来研究。