• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNABP:基于随机森林特征选择识别DNA结合蛋白并预测结合残基

DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

作者信息

Ma Xin, Guo Jing, Sun Xiao

机构信息

School of Science, Nanjing Audit University, Nanjing, China.

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

PLoS One. 2016 Dec 1;11(12):e0167345. doi: 10.1371/journal.pone.0167345. eCollection 2016.

DOI:10.1371/journal.pone.0167345
PMID:27907159
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5132331/
Abstract

DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

摘要

DNA结合蛋白在细胞过程中至关重要。近年来,已经开发了几种基于计算的方法来改进DNA结合蛋白的预测。然而,从蛋白质序列信息预测DNA结合蛋白方面的工作还不够充分。本文设计了一种新型预测器DNABP(DNA结合蛋白),使用具有混合特征的随机森林(RF)分类器来预测DNA结合蛋白。混合特征包含两种新型序列特征,它们反映了氨基酸物理化学性质的保守性信息,以及DNA结合残基的结合倾向和非结合残基的非结合倾向。与每个特征的比较表明,这两种新特征对预测能力的提高贡献最大。此外,为了提高DNABP模型的预测性能,在模型构建过程中使用了最小冗余最大相关(mRMR)方法结合增量特征选择(IFS)进行特征选择。结果表明,DNABP模型的准确率可达86.90%,灵敏度为83.76%,特异性为90.03%,马修斯相关系数为0.727。高预测准确率以及与先前研究的性能比较表明,DNABP可能是一种从序列信息中识别DNA结合蛋白的有用方法。DNABP网络服务器系统可在http://www.cbi.seu.edu.cn/DNABP/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/70f208134625/pone.0167345.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/e7d3b2d31fc5/pone.0167345.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/5615f4175590/pone.0167345.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/e8977993d3a6/pone.0167345.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/df63e0fa024b/pone.0167345.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/70f208134625/pone.0167345.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/e7d3b2d31fc5/pone.0167345.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/5615f4175590/pone.0167345.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/e8977993d3a6/pone.0167345.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/df63e0fa024b/pone.0167345.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a71/5132331/70f208134625/pone.0167345.g005.jpg

相似文献

1
DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.DNABP:基于随机森林特征选择识别DNA结合蛋白并预测结合残基
PLoS One. 2016 Dec 1;11(12):e0167345. doi: 10.1371/journal.pone.0167345. eCollection 2016.
2
Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.
3
Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature.使用具有混合特征的随机森林模型从氨基酸序列预测蛋白质中的DNA结合残基。
Bioinformatics. 2009 Jan 1;25(1):30-5. doi: 10.1093/bioinformatics/btn583. Epub 2008 Nov 12.
4
Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.基于新型混合特征的富集随机森林模型预测蛋白质中 RNA 结合残基的一级序列
Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.
5
Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.基于序列的RNA结合蛋白预测:使用具有最小冗余最大相关特征选择的随机森林算法
Biomed Res Int. 2015;2015:425810. doi: 10.1155/2015/425810. Epub 2015 Oct 12.
6
Prediction of interactiveness of proteins and nucleic acids based on feature selections.基于特征选择的蛋白质和核酸相互作用预测。
Mol Divers. 2010 Nov;14(4):627-33. doi: 10.1007/s11030-009-9198-9. Epub 2009 Oct 9.
7
PRBP: Prediction of RNA-Binding Proteins Using a Random Forest Algorithm Combined with an RNA-Binding Residue Predictor.PRBP:结合RNA结合残基预测器,使用随机森林算法预测RNA结合蛋白
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1385-93. doi: 10.1109/TCBB.2015.2418773.
8
Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection.基于序列的ATP结合残基预测器,采用随机森林和mRMR-IFS特征选择方法。
J Theor Biol. 2014 Nov 7;360:59-66. doi: 10.1016/j.jtbi.2014.06.037. Epub 2014 Jul 8.
9
Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.通过最大相关最小冗余(mRMR)特征选择预测酶的活性位点。
Mol Biosyst. 2013 Jan 27;9(1):61-9. doi: 10.1039/c2mb25327e. Epub 2012 Nov 2.
10
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.SNBRFinder:一种基于序列的混合算法,用于增强对核酸结合残基的预测。
PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015.

引用本文的文献

1
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
2
HormoNet: a deep learning approach for hormone-drug interaction prediction.HormoNet:一种用于激素-药物相互作用预测的深度学习方法。
BMC Bioinformatics. 2024 Feb 28;25(1):87. doi: 10.1186/s12859-024-05708-7.
3
DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform.

本文引用的文献

1
Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.通过结合自互协方差变换和集成学习来鉴定DNA结合蛋白。
IEEE Trans Nanobioscience. 2016 Jun;15(4):328-334. doi: 10.1109/TNB.2016.2555951. Epub 2016 Apr 20.
2
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法
Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.
3
Recombination spot identification Based on gapped k-mers.
DBP-iDWT:利用多视角进化特征和离散小波变换提高 DNA 结合蛋白预测
Comput Intell Neurosci. 2022 Sep 28;2022:2987407. doi: 10.1155/2022/2987407. eCollection 2022.
4
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.
5
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
6
Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.基于深度学习利用局部特征和与一级序列的长期依赖性预测DNA结合蛋白。
PeerJ. 2021 May 3;9:e11262. doi: 10.7717/peerj.11262. eCollection 2021.
7
AptaNet as a deep learning approach for aptamer-protein interaction prediction.AptaNet 作为一种深度学习方法,用于适配体-蛋白质相互作用预测。
Sci Rep. 2021 Mar 16;11(1):6074. doi: 10.1038/s41598-021-85629-0.
8
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.基于氨基酸序列中上下文特征的 DNA 结合蛋白预测的改进深度学习方法。
PLoS One. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317. eCollection 2019.
9
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers.一种通过协调多视图特征和分类器来识别DNA结合蛋白的模型堆叠框架。
Genes (Basel). 2018 Aug 1;9(8):394. doi: 10.3390/genes9080394.
10
A random forest classifier predicts recurrence risk in patients with ovarian cancer.随机森林分类器预测卵巢癌患者的复发风险。
Mol Med Rep. 2018 Sep;18(3):3289-3297. doi: 10.3892/mmr.2018.9300. Epub 2018 Jul 19.
基于缺口 k- -mer 的重组位点识别。
Sci Rep. 2016 Mar 31;6:23934. doi: 10.1038/srep23934.
4
iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions.iMiRNA-SSF:通过结合不同分布的负集改进微小RNA前体的识别
Sci Rep. 2016 Jan 12;6:19062. doi: 10.1038/srep19062.
5
DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白
Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.
6
Application of learning to rank to protein remote homology detection.学习排序在蛋白质远程同源检测中的应用。
Bioinformatics. 2015 Nov 1;31(21):3492-8. doi: 10.1093/bioinformatics/btv413. Epub 2015 Jul 10.
7
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One:一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.
8
Identification of real microRNA precursors with a pseudo structure status composition approach.采用伪结构状态组成方法鉴定真实的微小RNA前体。
PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.
9
Identification of hepatocellular carcinoma-related genes with a machine learning and network analysis.通过机器学习和网络分析鉴定肝细胞癌相关基因
J Comput Biol. 2015 Jan;22(1):63-71. doi: 10.1089/cmb.2014.0122.
10
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.新型DNA-蛋白质:利用支持向量机和综合序列表示法预测DNA结合蛋白
Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.