Narlikar Leelavati, Gordân Raluca, Ohler Uwe, Hartemink Alexander J
Department of Computer Science, Duke University, Durham, NC 27708, USA.
Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251.
An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm.
Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites.
Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
分子生物学中的一个重要问题是,在给定一组被认为与转录因子(TF)结合的DNA序列的情况下,确定该转录因子与DNA结合的位置。在之前的工作中,我们表明结合位点DNA序列中的信息足以预测与之结合的转录因子的结构类别。特别是,这表明我们可以预测在任何DNA序列中,哪些位置比其他位置更有可能被某些类别的转录因子结合。在此,我们认为通过采用转录因子结合位点出现在每个序列位置的信息性先验概率,可以显著改进传统的从头基序发现方法。为了证明这种方法的实用性,我们提出了Priority,一种强大的全新从头基序发现算法。
利用TRANSFAC中的数据,我们训练了三个分类器来识别碱性亮氨酸拉链、叉头和碱性螺旋环螺旋转录因子的结合位点。除了用于处理其他类转录因子的默认先验概率外,这些分类器还用于为Priority配备三个类特异性先验概率。我们将Priority和一些流行的基序发现程序应用于芯片杂交(ChIP-chip)报告的与特定转录因子结合的酵母基因间区域集合。Priority识别出了其他方法未能识别的基序,并正确预测了识别所确定结合位点的转录因子的结构类别。
补充材料和代码可在http://www.cs.duke.edu/~amink/上找到。