de Hoon M J L, Makita Y, Imoto S, Kobayashi K, Ogasawara N, Nakai K, Miyano S
Human Genome Center, Institute of Medical Science, University of Tokyo, Shirokanedai, Minato-ku, Tokyo, Japan.
Bioinformatics. 2004 Aug 4;20 Suppl 1:i101-8. doi: 10.1093/bioinformatics/bth927.
Sigma factors regulate the expression of genes in Bacillus subtilis at the transcriptional level. We assess the accuracy of a fold-change analysis, Bayesian networks, dynamic models and supervised learning based on coregulation in predicting gene regulation by sigma factors from gene expression data. To improve the prediction accuracy, we combine sequence information with expression data by adding their log-likelihood scores and by using a logistic regression model. We use the resulting score function to discover currently unknown gene regulations by sigma factors.
The coregulation-based supervised learning method gave the most accurate prediction of sigma factors from expression data. We found that the logistic regression model effectively combines expression data with sequence information. In a genome-wide search, highly significant logistic regression scores were found for several genes whose transcriptional regulation is currently unknown. We provide the corresponding RNA polymerase binding sites to enable a straightforward experimental verification of these predictions.
σ因子在转录水平上调控枯草芽孢杆菌中基因的表达。我们评估了基于倍数变化分析、贝叶斯网络、动态模型以及基于共调控的监督学习,从基因表达数据预测σ因子基因调控的准确性。为提高预测准确性,我们通过添加对数似然分数并使用逻辑回归模型,将序列信息与表达数据相结合。我们使用由此产生的评分函数来发现目前未知的由σ因子介导的基因调控。
基于共调控的监督学习方法从表达数据中对σ因子的预测最为准确。我们发现逻辑回归模型有效地将表达数据与序列信息结合起来。在全基因组搜索中,发现几个转录调控目前未知的基因具有高度显著的逻辑回归分数。我们提供了相应的RNA聚合酶结合位点,以便对这些预测进行直接的实验验证。