Suppr超能文献

致力于完成所有脊椎动物物种的完整且无错误的基因组组装。

Towards complete and error-free genome assemblies of all vertebrate species.

机构信息

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Department of Genetics, University of Cambridge, Cambridge, UK.

出版信息

Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

摘要

高质量且完整的参考基因组组装对于将基因组学应用于生物学、疾病和生物多样性保护至关重要。然而,这样的组装仅可用于少数非微生物物种。为了解决这个问题,国际基因组 10K(G10K)联盟在五年的时间里,致力于评估和开发具有成本效益的方法,以组装高度准确且近乎完整的参考基因组。在这里,我们介绍了为代表六个主要脊椎动物谱系的 16 个物种生成组装所获得的经验教训。我们证实,长读测序技术对于最大限度地提高基因组质量至关重要,而未解决的复杂重复序列和单倍型杂合性如果处理不当,则是组装错误的主要来源。我们的组装纠正了大量错误,在一些历史上最好的参考基因组中添加了缺失的序列,并揭示了生物学发现。这些发现包括鉴定出许多错误的基因复制、基因大小增加、特定于谱系的染色体重排、蝙蝠基因组中重复的独立染色体断裂点,以及蛋白质编码基因及其调控区中典型的 GC 丰富模式。我们吸取了这些经验教训,已经开始了脊椎动物基因组计划(VGP),这是一项国际努力,旨在为大约 70000 种现存的脊椎动物物种生成高质量、完整的参考基因组,并帮助开启生命科学的新时代。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6912/8081667/35d04bc38998/41586_2021_3451_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验