Roider Helge G, Kanhere Aditi, Manke Thomas, Vingron Martin
Max-Planck-Institute for Molecular Genetics Ihnestrasse 73, 14195 Berlin, Germany.
Bioinformatics. 2007 Jan 15;23(2):134-41. doi: 10.1093/bioinformatics/btl565. Epub 2006 Nov 10.
Theoretical efforts to understand the regulation of gene expression are traditionally centered around the identification of transcription factor binding sites at specific DNA positions. More recently these efforts have been supplemented by experimental data for relative binding affinities of proteins to longer intergenic sequences. The question arises to what extent these two approaches converge. In this paper, we adopt a physical binding model to predict the relative binding affinity of a transcription factor for a given sequence.
We find that a significant fraction of genome-wide binding data in yeast can be accounted for by simple count matrices and a physical model with only two parameters. We demonstrate that our approach is both conceptually and practically more powerful than traditional methods, which require selection of a cutoff. Our analysis yields biologically meaningful parameters, suitable for predicting relative binding affinities in the absence of experimental binding data.
The C source code for our TRAP program is freely available for non-commercial use at http://www.molgen.mpg.de/~manke/papers/TFaffinities/
传统上,理解基因表达调控的理论研究主要围绕特定DNA位置上转录因子结合位点的识别。最近,这些研究得到了蛋白质与较长基因间序列相对结合亲和力实验数据的补充。问题在于这两种方法在多大程度上趋于一致。在本文中,我们采用一种物理结合模型来预测转录因子对给定序列的相对结合亲和力。
我们发现,酵母全基因组结合数据的很大一部分可以通过简单计数矩阵和仅包含两个参数的物理模型来解释。我们证明,我们的方法在概念和实践上都比传统方法更强大,传统方法需要选择一个截止值。我们的分析得出了具有生物学意义的参数,适用于在没有实验结合数据的情况下预测相对结合亲和力。
我们的TRAP程序的C源代码可在http://www.molgen.mpg.de/~manke/papers/TFaffinities/上免费用于非商业用途。