Suppr超能文献

完整的脊椎动物线粒体基因组揭示了广泛的重复和基因重复。

Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.

机构信息

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA.

Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA.

出版信息

Genome Biol. 2021 Apr 29;22(1):120. doi: 10.1186/s13059-021-02336-9.

Abstract

BACKGROUND

Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.

RESULTS

As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.

CONCLUSIONS

Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

摘要

背景

现代测序技术应该使相对较小的线粒体基因组的组装变得容易。然而,直接处理线粒体组装的工具很少。

结果

作为脊椎动物基因组计划 (VGP) 的一部分,我们开发了 mitoVGP,这是一个基于相似性的线粒体reads 识别和从头组装线粒体基因组的全自动流水线,它结合了长 (> 10 kbp,PacBio 或 Nanopore) 和短 (100-300 bp,Illumina) reads。我们的流水线成功地完成了 VGP 中 100 种脊椎动物的完整线粒体基因组组装。我们观察到组织类型和文库大小选择对线粒体测序和组装有很大的影响。将我们的组装与基于短读测序的据称完整参考线粒体基因组进行比较,我们在这些参考基因组中发现了错误、缺失序列和不完整的基因,特别是在重复区域。我们的组装还鉴定了新的基因区域重复。在本文中组装的一半以上物种中存在重复和重复,表明它们的存在是线粒体结构的原则,而不是例外,这为线粒体基因组的进化和组织提供了新的认识。

结论

我们的结果表明,即使在“简单”的脊椎动物线粒体基因组的情况下,许多当前可用的参考序列的完整性也可以进一步提高,在声称完整组装一个线粒体基因组时,特别是仅从短读序列,应该谨慎行事。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecde/8082918/952b1c57c0f5/13059_2021_2336_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验