基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.

机构信息

State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University and Nanjing Audit University, Nanjing, P.R. China.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

DOI:10.1109/TCBB.2012.106

PMID:22868682

Abstract

The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew’s correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.

摘要

识别蛋白质中的 DNA 结合残基对于理解 DNA-蛋白质相互作用的机制、基因表达以及指导药物设计至关重要。因此，我们提出了一种基于随机森林（RF）分类器的预测方法 DNABR（DNA 结合残基），用于预测蛋白质序列中的 DNA 结合残基。本研究提出了两种新型序列特征，反映了氨基酸理化性质保守性的信息，以及不同序列位置之间氨基酸在理化性质方面的相关性。第一种特征使用了进化信息，结合了氨基酸理化性质的保守性，而第二种特征则反映了蛋白质序列中氨基酸极性电荷和疏水性的依赖效应。这两种特征以及一个反映 20 种氨基酸特征的正交二进制向量，用于构建 DNABR，这是一种预测蛋白质中 DNA 结合残基的模型。DNABR 模型的马修斯相关系数（MCC）值为 0.6586，总体准确率（ACC）为 93.04%，敏感性（SE）为 68.47%，特异性（SP）为 98.16%。与每个特征的比较表明，这两种新特征对提高预测能力贡献最大。此外，与其他方法的性能比较清楚地表明，DNABR 具有出色的预测性能，可用于检测潜在 DNA 结合蛋白中的结合残基。DNABR 网络服务器系统可免费在 http://www.cbi.seu.edu.cn/DNABR/ 获取。

相似文献

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.基于新型混合特征的富集随机森林模型预测蛋白质中 RNA 结合残基的一级序列

Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.

Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature.使用具有混合特征的随机森林模型从氨基酸序列预测蛋白质中的DNA结合残基。

Bioinformatics. 2009 Jan 1;25(1):30-5. doi: 10.1093/bioinformatics/btn583. Epub 2008 Nov 12.

DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.DNABP：基于随机森林特征选择识别DNA结合蛋白并预测结合残基

PLoS One. 2016 Dec 1;11(12):e0167345. doi: 10.1371/journal.pone.0167345. eCollection 2016.

DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins.DP-Bind：一个用于基于序列预测DNA结合蛋白中DNA结合残基的网络服务器。

Bioinformatics. 2007 Mar 1;23(5):634-6. doi: 10.1093/bioinformatics/btl672. Epub 2007 Jan 19.

PRBP: Prediction of RNA-Binding Proteins Using a Random Forest Algorithm Combined with an RNA-Binding Residue Predictor.PRBP：结合RNA结合残基预测器，使用随机森林算法预测RNA结合蛋白

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1385-93. doi: 10.1109/TCBB.2015.2418773.

Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information.基于序列信息，使用拉普拉斯支持向量机预测蛋白质中的微小RNA结合残基。

J Bioinform Comput Biol. 2018 Jun;16(3):1840009. doi: 10.1142/S0219720018400097. Epub 2018 Feb 4.

Identification of DNA-binding and protein-binding proteins using enhanced graph wavelet features.利用增强图小波特征鉴定 DNA 结合蛋白和蛋白质结合蛋白。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):1017-31. doi: 10.1109/TCBB.2013.117.

Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs.基于序列特征加权组合和 Boosting 多个 SVM 预测蛋白质-DNA 结合残基

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1389-1398. doi: 10.1109/TCBB.2016.2616469. Epub 2016 Oct 11.

Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.利用进化和结构信息预测DNA结合蛋白上的DNA结合位点。

Proteins. 2006 Jul 1;64(1):19-27. doi: 10.1002/prot.20977.

引用本文的文献

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.

Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述：从蛋白质到核酸及其他。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.

EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins.EPDRNA：一种用于识别疾病相关蛋白质中DNA-RNA结合位点的模型。

Protein J. 2024 Jun;43(3):513-521. doi: 10.1007/s10930-024-10183-3. Epub 2024 Mar 16.

ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction.ULDNA：将无监督多源语言模型与 LSTM-注意力网络集成，以实现高精度的蛋白质-DNA 结合位点预测。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae040.

HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins.HybridDBRpred：利用结构复合物和无序蛋白的注释改进基于序列的 DNA 结合氨基酸预测。

Nucleic Acids Res. 2024 Jan 25;52(2):e10. doi: 10.1093/nar/gkad1131.

In Silico Methods for Identification of Potential Active Sites of Therapeutic Targets.基于计算机的方法鉴定治疗靶标潜在活性部位

Molecules. 2022 Oct 20;27(20):7103. doi: 10.3390/molecules27207103.

iProDNA-CapsNet: identifying protein-DNA binding residues using capsule neural networks.iProDNA-CapsNet：使用胶囊神经网络识别蛋白-DNA 结合残基。

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):634. doi: 10.1186/s12859-019-3295-2.

SXGBsite: Prediction of Protein-Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting.SXGBsite：基于序列信息和极端梯度提升的蛋白质-配体结合位点预测。

Genes (Basel). 2019 Nov 22;10(12):965. doi: 10.3390/genes10120965.

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.PDRLGB：使用轻量级梯度提升机进行精确的 DNA 结合残基预测。

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献