Persikov Anton V, Osada Robert, Singh Mona
Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA.
Bioinformatics. 2009 Jan 1;25(1):22-9. doi: 10.1093/bioinformatics/btn580. Epub 2008 Nov 13.
Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain.
We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions.
An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
Cys(2)His(2)锌指(ZF)蛋白是真核转录因子中最大的一类。它们的模块化结构和高度保守的蛋白质 - DNA界面使得即使在特定蛋白质的结合位点未知的情况下,也能够开发出预测其DNA结合偏好的计算方法。ZF蛋白与DNA相互作用的“经典模型”每个锌指结构域仅包含四个氨基酸 - 核苷酸接触。
我们提出了一种基于支持向量机(SVM)预测ZF结合的方法。虽然之前的大多数计算方法仅基于已知的ZF蛋白 - DNA相互作用实例,但我们的方法还纳入了关于已知弱结合或根本不结合的蛋白质 - DNA对的信息。此外,具有线性核的支持向量机可以自然地纳入关于蛋白质 - DNA对相对结合亲和力的约束;这类信息以前在预测ZF蛋白 - DNA结合中尚未使用。在这里,我们构建了一个高质量的、源自文献的ZF - DNA结合实例实验数据库,并利用它来测试基于经典结合模型预测ZF蛋白 - DNA结合的线性核和多项式核。多项式支持向量机的性能优于先前发表的预测程序以及线性支持向量机。这可能表明经典结合模型中接触之间存在依赖性,并表明对基础结构模型的修改可能会在预测ZF蛋白 - DNA结合方面带来进一步的性能提升。总体而言,这项工作表明,纳入蛋白质 - DNA对非结合和相对结合信息的方法在有效预测蛋白质 - DNA相互作用方面具有巨大潜力。
可通过http://compbio.cs.princeton.edu/zf/获取用于预测ZF DNA结合的在线工具。