Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.
Humanitas Clinical and Research Center, Rozzano, Milan, Italy.
BMC Bioinformatics. 2019 Mar 7;20(1):117. doi: 10.1186/s12859-019-2704-x.
In bacterial genomes, there are two mechanisms to terminate the DNA transcription: the "intrinsic" or Rho-independent termination and the Rho-dependent termination. Intrinsic terminators are characterized by a RNA hairpin followed by a run of 6-8 U residues relatively easy to identify using one of the numerous available prediction programs. In contrast, Rho-dependent termination is mediated by the Rho protein factor that, firstly, binds to ribosome-free mRNA in a site characterized by a C > G content and then reaches the RNA polymerase to induce its release. Conversely on intrinsic terminators, the computational prediction of Rho-dependent terminators in prokaryotes is a very difficult problem because the sequence features required for the function of Rho are complex and poorly defined. This is the reason why it still does not exist an exhaustive Rho-dependent terminators prediction program.
In this study we introduce RhoTermPredict, the first published algorithm for an exhaustive Rho-dependent terminators prediction in bacterial genomes. RhoTermPredict identifies these elements based on a previously proposed consensus motif common to all Rho-dependent transcription terminators. It essentially searches for a 78 nt long RUT site characterized by a C > G content and with regularly spaced C residues, followed by a putative pause site for the RNA polymerase. We tested RhoTermPredict performances by using available genomic and transcriptomic data of the microorganism Escherichia coli K-12, both in limited-length sequences and in the whole-genome, and available genomic sequences from Bacillus subtilis 168 and Salmonella enterica LT2 genomes. We also estimated the overlap between the predictions of RhoTermPredict and those obtained by the predictor of intrinsic terminators ARNold webtool. Our results demonstrated that RhoTermPredict is a very performing algorithm both for limited-length sequences (F-score obtained about 0.7) and for a genome-wide analysis. Furthermore the degree of overlap with ARNold predictions was very low.
Our analysis shows that RhoTermPredict is a powerful tool for Rho-dependent terminators search in the three analyzed genomes and could fill this gap in computational genomics. We conclude that RhoTermPredict could be used in combination with an intrinsic terminators predictor in order to predict all the transcription terminators in bacterial genomes.
在细菌基因组中,有两种终止 DNA 转录的机制:“内在”或 Rho 不依赖终止和 Rho 依赖终止。内在终止子的特征是 RNA 发夹后跟着 6-8 个 U 残基的序列,使用众多可用的预测程序之一很容易识别。相比之下,Rho 依赖终止由 Rho 蛋白因子介导,该因子首先在富含 C>G 的位点结合核糖体游离的 mRNA,然后到达 RNA 聚合酶,诱导其释放。相反,在内在终止子上,预测原核生物中的 Rho 依赖终止子是一个非常困难的问题,因为 Rho 功能所需的序列特征复杂且定义不明确。这就是为什么目前还没有一个详尽的 Rho 依赖终止子预测程序。
在这项研究中,我们引入了 RhoTermPredict,这是第一个用于细菌基因组中详尽的 Rho 依赖终止子预测的已发表算法。RhoTermPredict 基于所有 Rho 依赖转录终止子共有的先前提出的共识基序来识别这些元件。它本质上是在一个富含 C>G 的 78nt 长的 RUT 位点上搜索,该位点具有规则间隔的 C 残基,然后是 RNA 聚合酶的潜在暂停位点。我们使用微生物大肠杆菌 K-12 的可用基因组和转录组数据、枯草芽孢杆菌 168 和沙门氏菌 LT2 基因组的可用基因组序列,在有限长度的序列和整个基因组中测试了 RhoTermPredict 的性能,并估计了 RhoTermPredict 的预测与 ARNold 网络工具预测的内在终止子的重叠。我们的结果表明,RhoTermPredict 是一种非常有效的算法,无论是在有限长度的序列(约 0.7 的 F 分数)还是在全基因组分析中。此外,与 ARNold 预测的重叠度非常低。
我们的分析表明,RhoTermPredict 是在三个分析的基因组中搜索 Rho 依赖终止子的有力工具,可以填补计算基因组学中的这一空白。我们得出结论,RhoTermPredict 可以与内在终止子预测器结合使用,以预测细菌基因组中的所有转录终止子。