Towsey Michael, Timms Peter, Hogan James, Mathews Sarah A
School of Life Sciences, Faculty of Science, Queensland University of Queensland, Brisbane, Queensland, Australia.
Comput Biol Chem. 2008 Oct;32(5):359-66. doi: 10.1016/j.compbiolchem.2008.07.009. Epub 2008 Jul 15.
Due to degeneracy of the observed binding sites, the in silico prediction of bacterial sigma(70)-like promoters remains a challenging problem. A large number of sigma(70)-like promoters has been biologically identified in only two species, Escherichia coli and Bacillus subtilis. In this paper we investigate the issues that arise when searching for promoters in other species using an ensemble of SVM classifiers trained on E. coli promoters. DNA sequences are represented using a tagged mismatch string kernel. The major benefit of our approach is that it does not require a prior definition of the typical -35 and -10 hexamers. This gives the SVM classifiers the freedom to discover other features relevant to the prediction of promoters. We use our approach to predict sigma(A) promoters in B. subtilis and sigma(66) promoters in Chlamydia trachomatis. We extended the analysis to identify specific regulatory features of gene sets in C. trachomatis having different expression profiles. We found a strong -35 hexamer and TGN/-10 associated with a set of early expressed genes. Our analysis highlights the advantage of using TSS-PREDICT as a starting point for predicting promoters in species where few are known.
由于所观察到的结合位点存在简并性,细菌类σ⁷⁰启动子的计算机预测仍然是一个具有挑战性的问题。仅在大肠杆菌和枯草芽孢杆菌这两个物种中通过生物学方法鉴定出了大量类σ⁷⁰启动子。在本文中,我们研究了使用在大肠杆菌启动子上训练的支持向量机(SVM)分类器集合在其他物种中搜索启动子时出现的问题。DNA序列使用带标签的错配串核进行表示。我们方法的主要优点是它不需要事先定义典型的-35和-10六聚体。这使支持向量机分类器能够自由地发现与启动子预测相关的其他特征。我们使用我们的方法来预测枯草芽孢杆菌中的σ⁺启动子和沙眼衣原体中的σ⁶⁶启动子。我们扩展了分析以识别沙眼衣原体中具有不同表达谱的基因集的特定调控特征。我们发现了一个与一组早期表达基因相关的强-35六聚体和TGN/-10。我们的分析突出了使用TSS-PREDICT作为在已知启动子较少的物种中预测启动子起点的优势。