Suppr超能文献

基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.

机构信息

State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University and Nanjing Audit University, Nanjing, P.R. China.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

Abstract

The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew’s correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.

摘要

识别蛋白质中的 DNA 结合残基对于理解 DNA-蛋白质相互作用的机制、基因表达以及指导药物设计至关重要。因此,我们提出了一种基于随机森林(RF)分类器的预测方法 DNABR(DNA 结合残基),用于预测蛋白质序列中的 DNA 结合残基。本研究提出了两种新型序列特征,反映了氨基酸理化性质保守性的信息,以及不同序列位置之间氨基酸在理化性质方面的相关性。第一种特征使用了进化信息,结合了氨基酸理化性质的保守性,而第二种特征则反映了蛋白质序列中氨基酸极性电荷和疏水性的依赖效应。这两种特征以及一个反映 20 种氨基酸特征的正交二进制向量,用于构建 DNABR,这是一种预测蛋白质中 DNA 结合残基的模型。DNABR 模型的马修斯相关系数(MCC)值为 0.6586,总体准确率(ACC)为 93.04%,敏感性(SE)为 68.47%,特异性(SP)为 98.16%。与每个特征的比较表明,这两种新特征对提高预测能力贡献最大。此外,与其他方法的性能比较清楚地表明,DNABR 具有出色的预测性能,可用于检测潜在 DNA 结合蛋白中的结合残基。DNABR 网络服务器系统可免费在 http://www.cbi.seu.edu.cn/DNABR/ 获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验