Tan Kai, McCue Lee Ann, Stormo Gary D
Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA.
Genome Res. 2005 Feb;15(2):312-20. doi: 10.1101/gr.3069205. Epub 2005 Jan 14.
The key components of a transcriptional regulatory network are the connections between trans-acting transcription factors and cis-acting DNA-binding sites. In spite of several decades of intense research, only a fraction of the estimated approximately 300 transcription factors in Escherichia coli have been linked to some of their binding sites in the genome. In this paper, we present a computational method to connect novel transcription factors and DNA motifs in E. coli. Our method uses three types of mutually independent information, two of which are gleaned by comparative analysis of multiple genomes and the third one derived from similarities of transcription-factor-DNA-binding-site interactions. The different types of information are combined to calculate the probability of a given transcription-factor-DNA-motif pair being a true pair. Tested on a study set of transcription factors and their DNA motifs, our method has a prediction accuracy of 59% for the top predictions and 85% for the top three predictions. When applied to 99 novel transcription factors and 70 novel DNA motifs, our method predicted 64 transcription-factor-DNA-motif pairs. Supporting evidence for some of the predicted pairs is presented. Functional annotations are made for 23 novel transcription factors based on the predicted transcription-factor-DNA-motif connections.
转录调控网络的关键组成部分是反式作用转录因子和顺式作用DNA结合位点之间的连接。尽管经过了几十年的深入研究,但在大肠杆菌中估计约300个转录因子中,只有一小部分与它们在基因组中的一些结合位点相关联。在本文中,我们提出了一种计算方法来连接大肠杆菌中的新型转录因子和DNA基序。我们的方法使用三种相互独立的信息,其中两种是通过对多个基因组的比较分析获得的,第三种来自转录因子-DNA结合位点相互作用的相似性。将不同类型的信息结合起来计算给定转录因子-DNA基序对为真实对的概率。在一组转录因子及其DNA基序的研究集上进行测试时,我们的方法对前几个预测的预测准确率为59%,对前三个预测的预测准确率为85%。当应用于99个新型转录因子和70个新型DNA基序时,我们的方法预测了64个转录因子-DNA基序对。给出了一些预测对的支持证据。基于预测的转录因子-DNA基序连接,对23个新型转录因子进行了功能注释。