Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India.
Nucleic Acids Res. 2012 Aug;40(15):7150-61. doi: 10.1093/nar/gks405. Epub 2012 May 27.
We present a set of four parameters that in combination can predict DNA-binding residues on protein structures to a high degree of accuracy. These are the number of evolutionary conserved residues (N(cons)) and their spatial clustering (ρ(e)), hydrogen bond donor capability (D(p)) and residue propensity (R(p)). We first used these parameters to characterize 130 interfaces in a set of 126 DNA-binding proteins (DBPs). The applicability of these parameters both individually and in combination, to distinguish the true binding region from the rest of the protein surface was then analyzed. R(p) shows the best performance identifying the true interface with the top rank in 83% cases. Importantly, we also used the unbound-bound test cases of the protein-DNA docking benchmark to test the efficacy of our method. When applied to the unbound form of the DBPs, R(p) can distinguish 86% cases. Finally, we have applied the SVM approach for recognizing the interface region using the above parameters along with the individual amino acid composition as attributes. The accuracy of prediction is 90.5% for the bound structures and 93.6% for the unbound form of the proteins.
我们提出了一组四个参数,它们结合起来可以高度准确地预测蛋白质结构上的 DNA 结合残基。这些参数是进化保守残基的数量 (N(cons)) 和它们的空间聚类 (ρ(e))、氢键供体能力 (D(p)) 和残基倾向 (R(p))。我们首先使用这些参数来描述 126 个 DNA 结合蛋白 (DBP) 中的 130 个界面。然后分析了这些参数单独和组合使用的适用性,以区分真实的结合区域和蛋白质表面的其余部分。在识别真实界面方面,R(p) 的表现最好,在 83%的情况下排名第一。重要的是,我们还使用蛋白质-DNA 对接基准测试的未结合-结合测试案例来测试我们方法的效果。当应用于 DBP 的未结合形式时,R(p) 可以区分 86%的情况。最后,我们应用 SVM 方法使用上述参数以及单个氨基酸组成作为属性来识别界面区域。对于结合结构,预测的准确性为 90.5%,对于蛋白质的未结合形式,预测的准确性为 93.6%。