New England Biolabs Inc., 240 County Road, Ipswich, MA, 01938, USA.
PacBio, 1305 O'Brien Drive, Menlo Park, CA, 94025, USA.
Nat Commun. 2018 Sep 10;9(1):3676. doi: 10.1038/s41467-018-05997-6.
Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing. In bacteria, the resulting fragmented information hides operon complexity. Additionally, in vivo processing of transcripts confounds the accurate identification of the 5' and 3' ends of operons. Here we develop a methodology called SMRT-Cappable-seq that combines the isolation of un-fragmented primary transcripts with single-molecule long read sequencing. Applied to E. coli, this technology results in an accurate definition of the transcriptome with 34% of known operons from RegulonDB being extended by at least one gene. Furthermore, 40% of transcription termination sites have read-through that alters the gene content of the operons. As a result, most of the bacterial genes are present in multiple operon variants reminiscent of eukaryotic splicing. By providing such granularity in the operon structure, this study represents an important resource for the study of prokaryotic gene network and regulation.
目前,用于全基因组基因表达分析的方法需要将原始转录本碎片化成小片段,以便进行短读测序。在细菌中,由此产生的碎片化信息隐藏了操纵子的复杂性。此外,转录本的体内加工使操纵子 5' 和 3' 末端的准确识别变得复杂。在这里,我们开发了一种称为 SMRT-Cappable-seq 的方法,该方法将未碎片化的初级转录本与单分子长读测序相结合。将该技术应用于大肠杆菌,可准确定义转录组,其中至少有一个基因扩展了来自 RegulonDB 的已知操纵子的 34%。此外,40%的转录终止位点存在通读,从而改变了操纵子的基因组成。结果,大多数细菌基因以类似于真核生物剪接的方式存在于多个操纵子变体中。通过在操纵子结构中提供这种粒度,本研究为研究原核基因网络和调控提供了重要资源。