Centre de Biophysique Moléculaire, CNRS UPR4301, rue Charles Sadron, 45071 Orléans cedex 2, France.
ED 549, Sciences Biologiques & Chimie du Vivant, Université d'Orléans, France.
Nucleic Acids Res. 2018 Sep 19;46(16):8245-8260. doi: 10.1093/nar/gky563.
Bacterial transcription termination proceeds via two main mechanisms triggered either by simple, well-conserved (intrinsic) nucleic acid motifs or by the motor protein Rho. Although bacterial genomes can harbor hundreds of termination signals of either type, only intrinsic terminators are reliably predicted. Computational tools to detect the more complex and diversiform Rho-dependent terminators are lacking. To tackle this issue, we devised a prediction method based on Orthogonal Projections to Latent Structures Discriminant Analysis [OPLS-DA] of a large set of in vitro termination data. Using previously uncharacterized genomic sequences for biochemical evaluation and OPLS-DA, we identified new Rho-dependent signals and quantitative sequence descriptors with significant predictive value. Most relevant descriptors specify features of transcript C>G skewness, secondary structure, and richness in regularly-spaced 5'CC/UC dinucleotides that are consistent with known principles for Rho-RNA interaction. Descriptors collectively warrant OPLS-DA predictions of Rho-dependent termination with a ∼85% success rate. Scanning of the Escherichia coli genome with the OPLS-DA model identifies significantly more termination-competent regions than anticipated from transcriptomics and predicts that regions intrinsically refractory to Rho are primarily located in open reading frames. Altogether, this work delineates features important for Rho activity and describes the first method able to predict Rho-dependent terminators in bacterial genomes.
细菌转录终止通过两种主要机制进行,要么由简单、高度保守(内在)核酸基序触发,要么由运动蛋白 Rho 触发。尽管细菌基因组可以包含数百种这两种类型的终止信号,但只有内在终止子可以被可靠地预测。缺乏用于检测更复杂和多样化的 Rho 依赖性终止子的计算工具。为了解决这个问题,我们设计了一种基于潜在结构判别分析的正交投影(OPLS-DA)的预测方法,该方法基于大量体外终止数据。我们使用以前未表征的基因组序列进行生化评估和 OPLS-DA,鉴定了新的 Rho 依赖性信号和具有显著预测价值的定量序列描述符。最相关的描述符指定了转录物 C>G 倾斜度、二级结构和规则间隔 5'CC/UC 二核苷酸丰富度的特征,这与已知的 Rho-RNA 相互作用原理一致。描述符共同保证了 OPLS-DA 对 Rho 依赖性终止的预测成功率约为 85%。用 OPLS-DA 模型对大肠杆菌基因组进行扫描,识别出比转录组学预期更多的终止能力区域,并预测 Rho 固有抗性区域主要位于开放阅读框中。总之,这项工作描绘了 Rho 活性的重要特征,并描述了第一个能够预测细菌基因组中 Rho 依赖性终止子的方法。