Tomovic Andrija, Oakeley Edward J
Friedrich Miescher Institute for Biomedical Research, Novartis Research Foundation, Basel, Switzerland.
Bioinformatics. 2007 Apr 15;23(8):933-41. doi: 10.1093/bioinformatics/btm055. Epub 2007 Feb 18.
Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction.
Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor-DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor-DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool.
大多数现有的转录因子结合位点预测工具所基于的方法假定结合位点碱基位置之间不存在序列依赖性。我们的主要目标是研究支持依赖性或独立性主张的统计依据,确定这种主张是否普遍成立,并利用所得数据开发用于结合位点预测的改进评分函数。
我们使用三种统计测试分析了显示相关位置的结合位点数量。我们分析了转录因子-DNA晶体结构以寻找位置依赖性的证据。我们的最终结论是,一些因子显示出依赖性证据,而另一些则没有。我们观察到,对于显示依赖性的序列,转录因子-DNA复合物的构象能量(Z分数)比不显示依赖性的序列更低(更好)(P < 0.02)。我们建议,在存在依赖性证据的情况下,应建立模型以改进结合位点预测。然而,当未发现显著依赖性时,应省略此校正。这可以通过将任何现有的假定独立性的评分函数转换为包含依赖性校正的形式来实现。我们给出了这样一种算法及其作为网络工具的实现示例。