Liu Zhijie, Mao Fenglou, Guo Jun-tao, Yan Bo, Wang Peng, Qu Youxing, Xu Ying
Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Georgia Athens, GA 30602, USA.
Nucleic Acids Res. 2005 Jan 26;33(2):546-58. doi: 10.1093/nar/gki204. Print 2005.
Computational evaluation of protein-DNA interaction is important for the identification of DNA-binding sites and genome annotation. It could validate the predicted binding motifs by sequence-based approaches through the calculation of the binding affinity between a protein and DNA. Such an evaluation should take into account structural information to deal with the complicated effects from DNA structural deformation, distance-dependent multi-body interactions and solvation contributions. In this paper, we present a knowledge-based potential built on interactions between protein residues and DNA tri-nucleotides. The potential, which explicitly considers the distance-dependent two-body, three-body and four-body interactions between protein residues and DNA nucleotides, has been optimized in terms of a Z-score. We have applied this knowledge-based potential to evaluate the binding affinities of zinc-finger protein-DNA complexes. The predicted binding affinities are in good agreement with the experimental data (with a correlation coefficient of 0.950). On a larger test set containing 48 protein-DNA complexes with known experimental binding free energies, our potential has achieved a high correlation coefficient of 0.800, when compared with the experimental data. We have also used this potential to identify binding motifs in DNA sequences of transcription factors (TF). The TFs in 79.4% of the known TF-DNA complexes have accurately found their native binding sequences from a large pool of DNA sequences. When tested in a genome-scale search for TF-binding motifs of the cyclic AMP regulatory protein (CRP) of Escherichia coli, this potential ranks all known binding motifs of CRP in the top 15% of all candidate sequences.
蛋白质 - DNA 相互作用的计算评估对于识别 DNA 结合位点和基因组注释非常重要。它可以通过计算蛋白质与 DNA 之间的结合亲和力,验证基于序列方法预测的结合基序。这种评估应考虑结构信息,以处理 DNA 结构变形、距离依赖性多体相互作用和溶剂化贡献等复杂影响。在本文中,我们提出了一种基于蛋白质残基与 DNA 三核苷酸之间相互作用的知识势能。该势能明确考虑了蛋白质残基与 DNA 核苷酸之间的距离依赖性两体、三体和四体相互作用,并根据 Z 分数进行了优化。我们已将这种基于知识的势能应用于评估锌指蛋白 - DNA 复合物的结合亲和力。预测的结合亲和力与实验数据高度吻合(相关系数为 0.950)。在一个包含 48 个具有已知实验结合自由能的蛋白质 - DNA 复合物的更大测试集上,与实验数据相比,我们的势能实现了 0.800 的高相关系数。我们还使用这种势能来识别转录因子(TF)DNA 序列中的结合基序。在 79.4% 的已知 TF - DNA 复合物中,TF 能够从大量 DNA 序列中准确找到其天然结合序列。在对大肠杆菌环磷酸腺苷调节蛋白(CRP)的 TF 结合基序进行全基因组规模搜索测试时,这种势能将 CRP 的所有已知结合基序排在所有候选序列的前 15% 中。