Molecular Genetics Group, Groningen Biomolecular Sciences and Biotechnology Institute, Centre for Synthetic Biology, University of Groningen, Nijenborgh 7, 9747 AG Groningen, the Netherlands.
Department of Fundamental Microbiology, Faculty of Biology and Medicine, University of Lausanne, Biophore Building, CH-1015 Lausanne, Switzerland.
Nucleic Acids Res. 2018 Nov 2;46(19):9971-9989. doi: 10.1093/nar/gky725.
A precise understanding of the genomic organization into transcriptional units and their regulation is essential for our comprehension of opportunistic human pathogens and how they cause disease. Using single-molecule real-time (PacBio) sequencing we unambiguously determined the genome sequence of Streptococcus pneumoniae strain D39 and revealed several inversions previously undetected by short-read sequencing. Significantly, a chromosomal inversion results in antigenic variation of PhtD, an important surface-exposed virulence factor. We generated a new genome annotation using automated tools, followed by manual curation, reflecting the current knowledge in the field. By combining sequence-driven terminator prediction, deep paired-end transcriptome sequencing and enrichment of primary transcripts by Cappable-Seq, we mapped 1015 transcriptional start sites and 748 termination sites. We show that the pneumococcal transcriptional landscape is complex and includes many secondary, antisense and internal promoters. Using this new genomic map, we identified several new small RNAs (sRNAs), RNA switches (including sixteen previously misidentified as sRNAs), and antisense RNAs. In total, we annotated 89 new protein-encoding genes, 34 sRNAs and 165 pseudogenes, bringing the S. pneumoniae D39 repertoire to 2146 genetic elements. We report operon structures and observed that 9% of operons are leaderless. The genome data are accessible in an online resource called PneumoBrowse (https://veeninglab.com/pneumobrowse) providing one of the most complete inventories of a bacterial genome to date. PneumoBrowse will accelerate pneumococcal research and the development of new prevention and treatment strategies.
准确理解基因组组织成转录单元及其调控对于我们理解机会性病原体以及它们如何引起疾病至关重要。我们使用单分子实时(PacBio)测序技术,明确确定了肺炎链球菌 D39 菌株的基因组序列,并揭示了先前短读测序未检测到的几个倒位。重要的是,染色体倒位导致 PhtD 的抗原变异,PhtD 是一种重要的表面暴露毒力因子。我们使用自动工具生成了新的基因组注释,然后进行手动整理,反映了该领域的最新知识。通过结合序列驱动终止子预测、深度配对末端转录组测序和 Cappable-Seq 对初级转录物的富集,我们确定了 1015 个转录起始位点和 748 个终止位点。我们表明,肺炎链球菌的转录景观复杂,包括许多次级、反义基因和内部启动子。利用这个新的基因组图谱,我们鉴定了几个新的小 RNA(sRNA)、RNA 开关(包括以前被错误鉴定为 sRNA 的 16 个)和反义 RNA。总共,我们注释了 89 个新的蛋白质编码基因、34 个 sRNA 和 165 个假基因,使肺炎链球菌 D39 的基因库达到 2146 个遗传元件。我们报告了操纵子结构,并观察到 9%的操纵子没有 leader。基因组数据可在一个名为 PneumoBrowse 的在线资源中获取(https://veeninglab.com/pneumobrowse),提供了迄今为止最完整的细菌基因组目录之一。PneumoBrowse 将加速肺炎链球菌的研究和新的预防和治疗策略的开发。