• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNA-Prot:利用随机森林从蛋白质序列信息中识别DNA结合蛋白。

DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.

作者信息

Kumar K Krishna, Pugalenthi Ganesan, Suganthan P N

机构信息

Institute for Neuro- and Bioinformatics, University of Lubeck, Lubeck 23538, Germany.

出版信息

J Biomol Struct Dyn. 2009 Jun;26(6):679-86. doi: 10.1080/07391102.2009.10507281.

DOI:10.1080/07391102.2009.10507281
PMID:19385697
Abstract

DNA-binding proteins (DNABPs) are important for various cellular processes, such as transcriptional regulation, recombination, replication, repair, and DNA modification. So far various bioinformatics and machine learning techniques have been applied for identification of DNA-binding proteins from protein structure. Only few methods are available for the identification of DNA binding proteins from protein sequence. In this work, we report a random forest method, DNA-Prot, to identify DNA binding proteins from protein sequence. Training was performed on the dataset containing 146 DNA-binding proteins and 250 non DNA-binding proteins. The algorithm was tested on the dataset containing 92 DNA-binding proteins and 100 non DNA-binding proteins. We obtained 80.31% accuracy from training and 84.37% accuracy from testing. Benchmarking analysis on the independent of 823 DNA-binding proteins and 823 non DNA-binding proteins shows that our approach can distinguish DNA-binding proteins from non DNA-binding proteins with more than 80% accuracy. We also compared our method with DNAbinder method on test dataset and two independent datasets. Comparable performance was observed from both methods on test dataset. In the benchmark dataset containing 823 DNA-binding proteins and 823 non DNA-binding proteins, we obtained significantly better performance from DNA-Prot with 81.83% accuracy whereas DNAbinder achieved only 61.42% accuracy using amino acid composition and 63.5% using PSSM profile. Similarly, DNA-Prot achieved better performance rate from the benchmark dataset containing 88 DNA-binding proteins and 233 non DNA-binding proteins. This result shows DNA-Prot can be efficiently used to identify DNA binding proteins from sequence information. The dataset and standalone version of DNA-Prot software can be obtained from http://www3.ntu.edu.sg/home/EPNSugan/index_files/dnaprot.htm.

摘要

DNA结合蛋白(DNABPs)对各种细胞过程都很重要,如转录调控、重组、复制、修复和DNA修饰。到目前为止,各种生物信息学和机器学习技术已被应用于从蛋白质结构中识别DNA结合蛋白。从蛋白质序列中识别DNA结合蛋白的方法很少。在这项工作中,我们报告了一种随机森林方法DNA-Prot,用于从蛋白质序列中识别DNA结合蛋白。在包含146个DNA结合蛋白和250个非DNA结合蛋白的数据集上进行训练。该算法在包含92个DNA结合蛋白和100个非DNA结合蛋白的数据集上进行测试。我们在训练中获得了80.31%的准确率,在测试中获得了84.37%的准确率。对823个DNA结合蛋白和823个非DNA结合蛋白的独立基准分析表明,我们的方法能够以超过80%的准确率区分DNA结合蛋白和非DNA结合蛋白。我们还在测试数据集和两个独立数据集上,将我们的方法与DNAbinder方法进行了比较。在测试数据集上,两种方法的性能相当。在包含823个DNA结合蛋白和823个非DNA结合蛋白的基准数据集中,我们的DNA-Prot方法表现显著更好,准确率为81.83%,而DNAbinder使用氨基酸组成时仅达到61.42%的准确率,使用PSSM谱时为63.5%。同样,在包含88个DNA结合蛋白和233个非DNA结合蛋白的基准数据集中,DNA-Prot也取得了更好的性能。这一结果表明,DNA-Prot可以有效地用于从序列信息中识别DNA结合蛋白。DNA-Prot软件的数据集和独立版本可从http://www3.ntu.edu.sg/home/EPNSugan/index_files/dnaprot.htm获取。

相似文献

1
DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.DNA-Prot:利用随机森林从蛋白质序列信息中识别DNA结合蛋白。
J Biomol Struct Dyn. 2009 Jun;26(6):679-86. doi: 10.1080/07391102.2009.10507281.
2
Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。
Amino Acids. 2010 Aug;39(3):777-83. doi: 10.1007/s00726-010-0520-8. Epub 2010 Feb 26.
3
SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information.SMpred:一种不使用进化信息即可识别蛋白质结构中结构模体的支持向量机方法。
J Biomol Struct Dyn. 2010 Dec;28(3):405-14. doi: 10.1080/07391102.2010.10507369.
4
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.新型DNA-蛋白质:利用支持向量机和综合序列表示法预测DNA结合蛋白
Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.
5
Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.利用进化和结构信息预测DNA结合蛋白上的DNA结合位点。
Proteins. 2006 Jul 1;64(1):19-27. doi: 10.1002/prot.20977.
6
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测
PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.
7
gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.gDNA-Prot:利用支持向量机和蛋白质序列的新型数值表征预测DNA结合蛋白。
J Theor Biol. 2016 Oct 7;406:8-16. doi: 10.1016/j.jtbi.2016.06.002. Epub 2016 Jul 1.
8
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot:通过应用集成学习识别DNA结合蛋白。
Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.
9
An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.一种基于分组权重进行蛋白质编码的简化字母表集合,用于预测DNA结合蛋白。
Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.
10
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法
Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

引用本文的文献

1
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
2
Hybrid_DBP: Prediction of DNA-binding proteins using hybrid features and convolutional neural networks.Hybrid_DBP:利用混合特征和卷积神经网络预测DNA结合蛋白。
Front Pharmacol. 2022 Oct 10;13:1031759. doi: 10.3389/fphar.2022.1031759. eCollection 2022.
3
DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform.
DBP-iDWT:利用多视角进化特征和离散小波变换提高 DNA 结合蛋白预测
Comput Intell Neurosci. 2022 Sep 28;2022:2987407. doi: 10.1155/2022/2987407. eCollection 2022.
4
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.
5
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.基于比对和基于预训练特征表示的 DNA 结合蛋白鉴定的比较分析。
Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022.
6
Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.通过极端梯度提升算法识别DNA结合蛋白。
Front Genet. 2022 Jan 28;12:821996. doi: 10.3389/fgene.2021.821996. eCollection 2021.
7
KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.KK-DBP:一种基于随机森林的用于DNA结合蛋白识别的多特征融合方法
Front Genet. 2021 Nov 29;12:811158. doi: 10.3389/fgene.2021.811158. eCollection 2021.
8
FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.FTWSVM-SR:基于自表示的模糊孪生支持向量机进行 DNA 结合蛋白识别。
Interdiscip Sci. 2022 Jun;14(2):372-384. doi: 10.1007/s12539-021-00489-6. Epub 2021 Nov 6.
9
UMAP-DBP: An Improved DNA-Binding Proteins Prediction Method Based on Uniform Manifold Approximation and Projection.UMAP-DBP:一种基于一致流形逼近和投影的改进 DNA 结合蛋白预测方法。
Protein J. 2021 Aug;40(4):562-575. doi: 10.1007/s10930-021-10011-y. Epub 2021 Jun 27.
10
A sequence-based multiple kernel model for identifying DNA-binding proteins.基于序列的多重核模型用于识别 DNA 结合蛋白。
BMC Bioinformatics. 2021 May 31;22(Suppl 3):291. doi: 10.1186/s12859-020-03875-x.