Suppr超能文献

为含有超长、近乎完全相同重复序列的复杂原核基因组的从头基因组组装推陈出新。

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats.

机构信息

Agroscope, Molecular Diagnostics, Genomics & Bioinformatics, Wädenswil CH-8820, Switzerland.

SIB Swiss Institute of Bioinformatics, Wädenswil CH-8820, Switzerland.

出版信息

Nucleic Acids Res. 2018 Sep 28;46(17):8953-8965. doi: 10.1093/nar/gky726.

Abstract

Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences' technology, but required very long reads from Oxford Nanopore Technologies. Importantly, a repeat analysis, whose results we release for over 9600 prokaryotes, indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this 'dark matter' for de novo genome assembly of prokaryotes. Several of these 'dark matter' genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assembly algorithms capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

摘要

对于原核生物而言,生成完整的从头基因组组装通常被认为是一个已解决的问题。然而,我们在这里表明,假单胞菌 P19E3 含有多个近相同的重复对,长度达 70 千碱基对,其中包含几个可能赋予菌株适应性优势的基因。其复杂的基因组还包括一个可变形的 shufflon 区域,不能用 Pacific Biosciences 技术产生的长读长从头组装,但需要来自 Oxford Nanopore Technologies 的非常长的读长。重要的是,重复分析的结果表明,非常复杂的细菌基因组代表了一种普遍现象,不仅限于假单胞菌。大约 10%的 9331 个完整细菌和少数 293 个完整古菌基因组代表了原核生物从头基因组组装的“暗物质”。这些“暗物质”基因组组装中的一些重复远远超出了所使用测序技术的分辨率,并且可能包含错误,其他基因组则通过费力的步骤如 cosmid 文库、引物行走或光学作图来闭合。使用非常长的测序读长和能够解决长的、近相同重复的组装算法,将使大多数原核生物基因组能够快速、完整地进行从头基因组组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/6158609/44da9a92a78d/gky726fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验