一种用于鉴定蛋白质表面 DNA 结合残基的基于特征的精确方法。

An accurate feature-based method for identifying DNA-binding residues on protein surfaces.

机构信息

School of Computer, Wuhan University, Wuhan 430072, People's Republic of China.

出版信息

Proteins. 2011 Feb;79(2):509-17. doi: 10.1002/prot.22898.

Abstract

Proteins that interact with DNA play vital roles in all mechanisms of gene expression and regulation. In order to understand these activities, it is crucial to analyze and identify DNA-binding residues on DNA-binding protein surfaces. Here, we proposed two novel features B-factor and packing density in combination with several conventional features to characterize the DNA-binding residues in a well-constructed representative dataset of 119 protein-DNA complexes from the Protein Data Bank (PDB). Based on the selected features, a prediction model for DNA-binding residues was constructed using support vector machine (SVM). The predictor was evaluated using a 5-fold cross validation on above dataset of 123 DNA-binding proteins. Moreover, two independent datasets of 83 DNA-bound protein structures and their corresponding DNA-free forms were compiled. The B-factor and packing density features were statistically analyzed on these 83 pairs of holo-apo proteins structures. Finally, we developed the SVM model to accurately predict DNA-binding residues on protein surface, given the DNA-free structure of a protein. Results showed here indicate that our method represents a significant improvement of previously existing approaches such as DISPLAR. The observation suggests that our method will be useful in studying protein-DNA interactions to guide consequent works such as site-directed mutagenesis and protein-DNA docking.

摘要

与 DNA 相互作用的蛋白质在基因表达和调控的所有机制中都起着至关重要的作用。为了理解这些活动，分析和识别 DNA 结合蛋白表面上的 DNA 结合残基是至关重要的。在这里，我们提出了两个新的特征，B 因子和堆积密度，结合几个常规特征，以描述来自蛋白质数据库（PDB）的 119 个蛋白质-DNA 复合物的代表性数据集的 DNA 结合残基。基于选定的特征，使用支持向量机（SVM）构建了用于 DNA 结合残基的预测模型。该预测器使用上述数据集上的 5 倍交叉验证进行了评估，数据集包含 123 个 DNA 结合蛋白。此外，还编译了 83 个 DNA 结合蛋白结构及其相应的无 DNA 形式的两个独立数据集。对这些 83 对全-脱辅基蛋白结构进行了 B 因子和堆积密度特征的统计分析。最后，我们开发了 SVM 模型，用于在给定蛋白质无 DNA 结构的情况下，准确预测蛋白质表面上的 DNA 结合残基。这里的结果表明，我们的方法代表了对以前存在的方法（如 DISPLAR）的重大改进。该观察结果表明，我们的方法将有助于研究蛋白质-DNA 相互作用，以指导随后的工作，如定点突变和蛋白质-DNA 对接。

相似文献

An accurate feature-based method for identifying DNA-binding residues on protein surfaces.一种用于鉴定蛋白质表面 DNA 结合残基的基于特征的精确方法。

Proteins. 2011 Feb;79(2):509-17. doi: 10.1002/prot.22898.

Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.利用进化和结构信息预测DNA结合蛋白上的DNA结合位点。

Proteins. 2006 Jul 1;64(1):19-27. doi: 10.1002/prot.20977.

Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method.使用混合支持向量机-位置特异性打分矩阵（SVM-PSSM）方法设计蛋白质中DNA结合位点的精确预测器。

Biosystems. 2007 Jul-Aug;90(1):234-41. doi: 10.1016/j.biosystems.2006.08.007. Epub 2006 Aug 23.

Prediction of protein-RNA binding sites by a random forest method with combined features.基于组合特征的随机森林方法预测蛋白质-RNA 结合位点。

Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.

Insights into the molecular recognition of the 5'-GNN-3' family of DNA sequences by zinc finger domains.锌指结构域对DNA序列5'-GNN-3'家族的分子识别研究

J Mol Biol. 2000 Nov 3;303(4):489-502. doi: 10.1006/jmbi.2000.4133.

PRINTR: prediction of RNA binding sites in proteins using SVM and profiles.PRINTR：使用支持向量机和图谱预测蛋白质中的RNA结合位点

Amino Acids. 2008 Aug;35(2):295-302. doi: 10.1007/s00726-007-0634-9. Epub 2008 Jan 31.

Identifying protein-protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area.鉴定瞬态复合物中蛋白质-蛋白质相互作用位点，采用温度因子、序列轮廓和可及表面积。

Amino Acids. 2010 Jan;38(1):263-70. doi: 10.1007/s00726-009-0245-8. Epub 2009 Feb 12.

Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein.基于蛋白质的氨基酸和二肽组成对基因表达水平进行相关性分析与预测。

BMC Bioinformatics. 2005 Mar 17;6:59. doi: 10.1186/1471-2105-6-59.

Shape string: a new feature for prediction of DNA-binding residues.形状字符串：预测 DNA 结合残基的新特征。

Biochimie. 2013 Feb;95(2):354-8. doi: 10.1016/j.biochi.2012.10.006. Epub 2012 Oct 29.

Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.基于支持向量机，利用氨基酸残基和氨基酸残基对的结构特性对蛋白质折叠进行分类。

Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.

引用本文的文献

Comparative Analysis of p Predictions for Arsonic Acids Using Density Functional Theory-Based and Machine Learning Approaches.基于密度泛函理论和机器学习方法的胂酸p预测的比较分析

ACS Omega. 2025 Jan 16;10(3):3128-3140. doi: 10.1021/acsomega.4c10413. eCollection 2025 Jan 28.

DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.DNAPred_Prot：利用基于组成和位置的特征识别DNA结合蛋白。

Appl Bionics Biomech. 2022 Apr 13;2022:5483115. doi: 10.1155/2022/5483115. eCollection 2022.

Protein p Prediction by Tree-Based Machine Learning.基于树的机器学习进行蛋白质 p 预测。

J Chem Theory Comput. 2022 Apr 12;18(4):2673-2686. doi: 10.1021/acs.jctc.1c01257. Epub 2022 Mar 15.

GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm.GIpred：一种使用机器学习算法预测巨蛋白的计算工具。

Physiol Mol Biol Plants. 2022 Jan;28(1):1-16. doi: 10.1007/s12298-022-01130-6. Epub 2022 Jan 24.

DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method.DeepDISE：一种基于深度学习的 DNA 结合位点预测方法。

Int J Mol Sci. 2021 May 24;22(11):5510. doi: 10.3390/ijms22115510.

The complexity of protein interactions unravelled from structural disorder.从结构无序中揭示蛋白质相互作用的复杂性。

PLoS Comput Biol. 2021 Jan 8;17(1):e1008546. doi: 10.1371/journal.pcbi.1008546. eCollection 2021 Jan.

Multiple protein-DNA interfaces unravelled by evolutionary information, physico-chemical and geometrical properties.通过进化信息、物理化学和几何性质揭示多个蛋白质-DNA 界面。

PLoS Comput Biol. 2020 Feb 3;16(2):e1007624. doi: 10.1371/journal.pcbi.1007624. eCollection 2020 Feb.

PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD：一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。

Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.

A novel model for malaria prediction based on ensemble algorithms.基于集成算法的疟疾预测新模型。

PLoS One. 2019 Dec 26;14(12):e0226910. doi: 10.1371/journal.pone.0226910. eCollection 2019.

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?RNA假尿苷修饰预测问题中是否存在任何序列特征？

Mol Ther Nucleic Acids. 2020 Mar 6;19:293-303. doi: 10.1016/j.omtn.2019.11.014. Epub 2019 Nov 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于鉴定蛋白质表面 DNA 结合残基的基于特征的精确方法。

An accurate feature-based method for identifying DNA-binding residues on protein surfaces.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献