The Laboratory for Molecular Infection Medicine Sweden (MIMS), 901 87 Umeå, Sweden.
Umeå Centre for Microbial Research (UCMR), 901 87, Umeå, Sweden.
BMC Genomics. 2020 Apr 6;21(1):285. doi: 10.1186/s12864-020-6565-5.
Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laboratory reference strain S. flexneri serotype 5a M90T has been extensively used to understand the molecular mechanisms of pathogenesis, its complete genome sequence is not available, thereby greatly limiting studies employing high-throughput sequencing and systems biology approaches.
We have sequenced, assembled, annotated and manually curated the full genome of S. flexneri 5a M90T. This yielded two complete circular contigs, the chromosome and the virulence plasmid (pWR100). To obtain the genome sequence, we have employed long-read PacBio DNA sequencing followed by polishing with Illumina RNA-seq data. This provides a new hybrid strategy to prepare gapless, highly accurate genome sequences, which also cover AT-rich tracks or repetitive sequences that are transcribed. Furthermore, we have performed genome-wide analysis of transcriptional start sites (TSS) and determined the length of 5' untranslated regions (5'-UTRs) at typical culture conditions for the inoculum of in vitro infection experiments. We identified 6723 primary TSS (pTSS) and 7328 secondary TSS (sTSS). The S. flexneri 5a M90T annotated genome sequence and the transcriptional start sites are integrated into RegulonDB (http://regulondb.ccg.unam.mx) and RSAT (http://embnet.ccg.unam.mx/rsat/) databases to use their analysis tools in the S. flexneri 5a M90T genome.
We provide the first complete genome for S. flexneri serotype 5a, specifically the laboratory reference strain M90T. Our work opens the possibility of employing S. flexneri M90T in high-quality systems biology studies such as transcriptomic and differential expression analyses or in genome evolution studies. Moreover, the catalogue of TSS that we report here can be used in molecular pathogenesis studies as a resource to know which genes are transcribed before infection of host cells. The genome sequence, together with the analysis of transcriptional start sites, is also a valuable tool for precise genetic manipulation of S. flexneri 5a M90T. Further, we present a new hybrid strategy to prepare gapless, highly accurate genome sequences. Unlike currently used hybrid strategies combining long- and short-read DNA sequencing technologies to maximize accuracy, our workflow using long-read DNA sequencing and short-read RNA sequencing provides the added value of using non-redundant technologies, which yield distinct, exploitable datasets.
志贺氏菌是一种革兰氏阴性兼性细胞内细菌,可导致人类细菌性痢疾。志贺氏菌通过其毒力质粒编码的 III 型分泌系统(T3SS)侵袭结肠黏膜细胞,并在靶细胞胞质中繁殖。尽管实验室参考菌株福氏志贺氏菌 5a M90T 已被广泛用于理解发病机制的分子机制,但它的完整基因组序列尚不可用,这极大地限制了采用高通量测序和系统生物学方法的研究。
我们已经对福氏志贺氏菌 5a M90T 的全基因组进行了测序、组装、注释和手动编辑。这得到了两个完整的圆形连续体,即染色体和毒力质粒(pWR100)。为了获得基因组序列,我们采用了长读长 PacBio DNA 测序,然后用 Illumina RNA-seq 数据进行了优化。这提供了一种新的混合策略来制备无间隙、高度准确的基因组序列,该策略还涵盖了转录的富含 AT 的轨道或重复序列。此外,我们对转录起始位点(TSS)进行了全基因组分析,并确定了用于体外感染实验接种物的典型培养条件下的 5'非翻译区(5'-UTR)长度。我们确定了 6723 个初级 TSS(pTSS)和 7328 个二级 TSS(sTSS)。福氏志贺氏菌 5a M90T 注释基因组序列和转录起始位点已整合到 RegulonDB(http://regulondb.ccg.unam.mx)和 RSAT(http://embnet.ccg.unam.mx/rsat/)数据库中,以在福氏志贺氏菌 5a M90T 基因组中使用它们的分析工具。
我们提供了福氏志贺氏菌血清型 5a 的第一个完整基因组,特别是实验室参考菌株 M90T。我们的工作为在高质量系统生物学研究中使用福氏志贺氏菌 M90T 提供了可能性,例如转录组和差异表达分析,或在基因组进化研究中。此外,我们报告的 TSS 目录可用于分子发病机制研究,作为了解感染宿主细胞之前哪些基因转录的资源。基因组序列以及转录起始位点的分析也是对福氏志贺氏菌 5a M90T 进行精确遗传操作的有价值工具。此外,我们提出了一种新的混合策略来制备无间隙、高度准确的基因组序列。与目前使用长读和短读 DNA 测序技术相结合以最大程度提高准确性的混合策略不同,我们使用长读 DNA 测序和短读 RNA 测序的工作流程提供了使用非冗余技术的附加值,这些技术产生了独特的、可利用的数据集。