Lillo Fabrizio, Basile Salvatore, Mantegna Rosario N
Istituto Nazionale per la Fisica della Materia, Unità di Palermo, Viale delle Scienze, I-90128, Palermo, Italy.
Bioinformatics. 2002 Jul;18(7):971-9. doi: 10.1093/bioinformatics/18.7.971.
Comparative genomics provides a powerful way to investigate regularities and differences observed at DNA level across species. Here we study the number and location of inverted repeats occurring in complete genomes of bacteria. Inverted repeats are compatible with the formation of hairpin structures in the messenger RNA. Some of these structures are known to be rho-independent intrinsic terminators.
We investigate the number of inverted repeats observed in 37 complete genomes of bacteria. The number of inverted repeats observed is much higher than expected using Markovian models of DNA sequences in most of the eubacteria. By using the information annotated in the genomes we discover that in most of the eubacteria the inverted repeats of stem length longer than 8 nucleotides preferentially locate near the 3' end of the nearest coding regions. We also show that IRs characterized by large values of the stem length locate preferentially in short non-coding regions bounded by two 3' ends of convergent genes. By using the program TransTerm recently introduced to predict transcription terminators in bacterial genomes, we conclude that only a part of the observed inverted repeats fullfils the model requirements characterizing rho-independent termination in several genomes.
比较基因组学提供了一种强大的方法来研究跨物种在DNA水平上观察到的规律和差异。在这里,我们研究细菌完整基因组中反向重复序列的数量和位置。反向重复序列与信使RNA中发夹结构的形成兼容。已知其中一些结构是不依赖于rho因子的内在终止子。
我们研究了在37个细菌完整基因组中观察到的反向重复序列的数量。在大多数真细菌中,观察到的反向重复序列的数量远高于使用DNA序列的马尔可夫模型预期的数量。通过利用基因组中注释的信息,我们发现在大多数真细菌中,茎长度超过8个核苷酸的反向重复序列优先位于最接近的编码区域的3'端附近。我们还表明,以茎长度的大值为特征的反向重复序列优先位于由两个收敛基因的3'端界定的短非编码区域中。通过使用最近引入的用于预测细菌基因组中转录终止子的程序TransTerm,我们得出结论,在几个基因组中,只有一部分观察到的反向重复序列满足表征不依赖于rho因子终止的模型要求。