Touzain Fabrice, Schbath Sophie, Debled-Rennesson Isabelle, Aigle Bertrand, Kucherov Gregory, Leblond Pierre
Laboratoire Lorrain de Recherche en Informatique et ses Applications, Campus Scientifique, B,P, 239, UMR CNRS-INPL-INRIA-Nancy 2-UHP 7503, 54506 Vandoeuvre-lès-Nancy, France.
BMC Bioinformatics. 2008 Jan 31;9:73. doi: 10.1186/1471-2105-9-73.
Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (sigma) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations.
We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these sigma factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping sigma factors were found as well as other SFBSs such as that of SigW in Bacillus strains.
We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility authorizes the recognition of other kinds of two-box regulatory sites.
已经开发了许多程序来识别转录因子结合位点。然而,它们中的大多数无法推断具有可变间隔长度的双字基序。RNA聚合酶西格玛(sigma)因子结合位点(SFBSs)通常就是这种情况,它通常由两个框组成,相对于转录起始点分别称为-35和-10框。我们的目标是设计一种算法,通过使用从生物学观察中推导出来的组合和统计约束来检测SFBSs。
我们描述了一种通过比较两个相关细菌基因组来识别SFBSs的新方法。该方法名为SIGffRid(使用R'MES选择输入数据的西格玛因子结合位点查找器),对直系同源基因的启动子区域对进行同步分析。SIGffRid使用全基因组中过度代表模式的先验识别作为潜在-35和-10框的选择标准。然后使用短种子对(其中一个可能有缺口)对这些模式进行分组,允许它们之间有可变长度的间隔。接下来,在统计考虑的指导下扩展基序,这一特征确保选择具有统计相关特性的基序。我们将我们的方法应用于天蓝色链霉菌和阿维链霉菌这对相关细菌基因组。详细介绍了与天蓝色链霉菌中SigR调控子中定义明确的SFBSs的交叉核对,验证了该算法。还发现了HrdB和BldN的SFBSs;结果为这些西格玛因子提出了一些新的靶标。此外,定义了BldD的共有基序和新的SFBSs结合位点,与先前提出的共有序列重叠。还对中等GC含量的细菌(即大肠杆菌/鼠伤寒沙门氏菌和枯草芽孢杆菌/地衣芽孢杆菌对)进行了相关测试。发现了管家西格玛因子的基序以及其他SFBSs,如芽孢杆菌菌株中SigW的SFBSs。
我们证明,我们结合统计和生物学标准的方法成功地预测了SFBSs。该方法的通用性允许识别其他类型的双框调控位点。