Petrillo Mauro, Silvestro Giustina, Di Nocera Pier Paolo, Boccia Angelo, Paolella Giovanni
CEINGE Biotecnologie Avanzate scarl Via Comunale Margherita 482, 80145 Napoli, Italy.
BMC Genomics. 2006 Jul 4;7:170. doi: 10.1186/1471-2164-7-170.
Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.
Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families.
In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed mRNAs. Three previously undescribed families of repeated sequences were found in Yersiniae, Bordetellae and Enterococci.
预测细菌基因组表达序列中的二级结构有助于研究相应RNA的自发折叠。这在非翻译的mRNA区域尤为重要,因为碱基配对受与翻译机制相互作用的影响较小。相对较大的茎环对更复杂二级结构的形成有显著贡献,这些二级结构通常对控制基因表达的序列元件的活性很重要。
本文对40个全测序细菌基因组中茎环结构(SLS)的分布进行了系统分析。搜索到的SLS的茎至少12 bp,环的长度为5至100 nt。茎中允许G-U配对。天然基因组中发现的SLS在数量和稳定性上始终比在大小和组成相当的序列中随机形成的SLS更多。大多数SLS位于蛋白质编码区域内,但在几个物种染色体的基因间区域中观察到特定的、非随机的、稳定性更高的SLS亚群富集。在低GC含量的厚壁菌门中,大多数稳定性较高的基因间SLS类似于典型的不依赖rho的转录终止子,但在5'端经常有一个额外的富含A的片段,与3'端的尿苷互补。在所有物种中,在基因间空间观察到明显偏向的SLS分布,大多数集中在侧翼CDS的3'端一侧。一些基因间SLS区域是新的重复序列家族的成员。
对40个不同细菌基因组中SLS特征和分布的深入分析表明,所有物种中都存在这种非随机结构群体。其中许多结构可能被转录,并可能参与转录终止的控制,或者可能作为RNA元件,增强共转录mRNA的稳定性或周转率。在耶尔森氏菌属、博德特氏菌属和肠球菌属中发现了三个以前未描述的重复序列家族。