Suppr超能文献

基因组重复、错误组装和重新注释:以长读重测序牙龈卟啉单胞菌参考株为例的研究。

Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains.

机构信息

Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.

Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica.

出版信息

BMC Genomics. 2018 Jan 16;19(1):54. doi: 10.1186/s12864-017-4429-4.

Abstract

BACKGROUND

Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation.

RESULTS

We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains.

CONCLUSIONS

In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.

摘要

背景

如果不知道其基因组序列,就不可能对构成人和动物微生物组的细菌进行功能建模。不幸的是,绝大多数可公开获得的基因组只是工作草案,这种不完整会导致许多问题,是基因和表型解释的主要障碍。在这项工作中,我们从拟杆菌门的 Bacteroidia 类群中的一个例子开始,该类群在人和消化道微生物组中占主导地位。我们成功确定了导致组装断裂和组装错误的遗传基因座,并证明了长读测序和精心注释的重要性和有用性。

结果

我们表明,来自大规模平行测序的拟杆菌属草案基因组的碎片化与基因组重复线性相关,其大小与读取的大小相同或更大。我们还表明,这些重复序列中的一些,特别是长的重复序列,对应于三个标记为环状(因此完整或完成)的参考牙龈卟啉单胞菌基因组中错误组装的基因座。我们证明,即使在适度的覆盖率(30X)下,长读重测序结合 PCR 连续性验证(rrn 操纵子和整合和共轭元件或 ICE)可用于识别和纠正错误组合或组装的区域。最后,尽管耗时且费力,但对三个牙龈卟啉单胞菌菌株进行一致的手动生物注释,使我们能够比较和纠正现有的基因组注释,从而更准确地解释这些菌株之间的基因组差异。

结论

在这项研究中,我们证明了长读测序在验证已发表基因组(即使是完整的)和生成具有高基因组可塑性的新细菌菌株/物种的组装方面的有用性和重要性。我们还表明,当与生物验证过程和勤奋的生物注释相结合时,这种策略有助于减少共享数据库中错误的传播,从而限制基于不完整或误导性信息的错误结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fb/5771137/9b798df406d6/12864_2017_4429_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验