Voss Björn, Georg Jens, Schön Verena, Ude Susanne, Hess Wolfgang R
University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Freiburg, Germany.
BMC Genomics. 2009 Mar 23;10:123. doi: 10.1186/1471-2164-10-123.
In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.
Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.
Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.
在细菌中,非编码RNA(ncRNA)是基因表达的关键调节因子,控制着各种应激反应、毒力和运动性。先前的研究揭示了一些海洋蓝细菌中存在相对大量的ncRNA。然而,为了进行有效的遗传和生化分析,希望在易于操作且有扩展的突变体、转录组和蛋白质组数据集的模式蓝细菌中鉴定出一组ncRNA候选基因。
在这里,我们利用比较基因组分析对三种单细胞模式蓝细菌集胞藻PCC6803、聚球藻PCC6301和嗜热栖热菌BP1以及有毒的铜绿微囊藻NIES843基因间区域中的ncRNA基因和其他序列/结构保守元件进行生物计算预测。这些菌株中预测元件的未过滤数量分别为383、168、168和809,合并为443个序列簇,而具有高支持度的单个元件数量分别为94、56、64和406。去除与转座子相关的重复序列后,最终分别剩下78、53、42和168个序列,属于数据集中的109个不同簇。对集胞藻PCC6803中选定的ncRNA候选物进行的实验分析验证了源自fabF - hoxH和apcC - prmA基因间间隔区的新ncRNA以及属于ncRNA的Yfr2家族的三个高表达ncRNA。Yfr2a启动子 - luxAB融合证实了该启动子的非常强的活性,并表明如果培养物暴露于升高的光照强度下,表达会受到刺激。
与Rfam中的条目进行比较以及对集胞藻PCC6803中选定的ncRNA候选物进行实验测试表明,尽管这些物种中的一些存在大量重复序列的污染,但当前预测具有较高的可靠性。特别是,我们在这四个物种中总共鉴定出8个属于ncRNA的Yfr2家族的新ncRNA同源物。RNA二级结构建模表明有两个保守的单链序列基序,可能参与RNA - 蛋白质相互作用或靶RNA的识别。由于我们的分析仅限于在这四种蓝细菌中寻找具有合理高保守度的ncRNA候选物,可能还有更多的ncRNA,需要直接的实验方法来鉴定它们。