School of Computer, Wuhan University, Wuhan 430072, People's Republic of China.
Proteins. 2011 Feb;79(2):509-17. doi: 10.1002/prot.22898.
Proteins that interact with DNA play vital roles in all mechanisms of gene expression and regulation. In order to understand these activities, it is crucial to analyze and identify DNA-binding residues on DNA-binding protein surfaces. Here, we proposed two novel features B-factor and packing density in combination with several conventional features to characterize the DNA-binding residues in a well-constructed representative dataset of 119 protein-DNA complexes from the Protein Data Bank (PDB). Based on the selected features, a prediction model for DNA-binding residues was constructed using support vector machine (SVM). The predictor was evaluated using a 5-fold cross validation on above dataset of 123 DNA-binding proteins. Moreover, two independent datasets of 83 DNA-bound protein structures and their corresponding DNA-free forms were compiled. The B-factor and packing density features were statistically analyzed on these 83 pairs of holo-apo proteins structures. Finally, we developed the SVM model to accurately predict DNA-binding residues on protein surface, given the DNA-free structure of a protein. Results showed here indicate that our method represents a significant improvement of previously existing approaches such as DISPLAR. The observation suggests that our method will be useful in studying protein-DNA interactions to guide consequent works such as site-directed mutagenesis and protein-DNA docking.
与 DNA 相互作用的蛋白质在基因表达和调控的所有机制中都起着至关重要的作用。为了理解这些活动,分析和识别 DNA 结合蛋白表面上的 DNA 结合残基是至关重要的。在这里,我们提出了两个新的特征,B 因子和堆积密度,结合几个常规特征,以描述来自蛋白质数据库(PDB)的 119 个蛋白质-DNA 复合物的代表性数据集的 DNA 结合残基。基于选定的特征,使用支持向量机(SVM)构建了用于 DNA 结合残基的预测模型。该预测器使用上述数据集上的 5 倍交叉验证进行了评估,数据集包含 123 个 DNA 结合蛋白。此外,还编译了 83 个 DNA 结合蛋白结构及其相应的无 DNA 形式的两个独立数据集。对这些 83 对全-脱辅基蛋白结构进行了 B 因子和堆积密度特征的统计分析。最后,我们开发了 SVM 模型,用于在给定蛋白质无 DNA 结构的情况下,准确预测蛋白质表面上的 DNA 结合残基。这里的结果表明,我们的方法代表了对以前存在的方法(如 DISPLAR)的重大改进。该观察结果表明,我们的方法将有助于研究蛋白质-DNA 相互作用,以指导随后的工作,如定点突变和蛋白质-DNA 对接。