• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

enDNA-Prot:通过应用集成学习识别DNA结合蛋白。

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.

作者信息

Xu Ruifeng, Zhou Jiyun, Liu Bin, Yao Lin, He Yulan, Zou Quan, Wang Xiaolong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China ; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.

出版信息

Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.

DOI:10.1155/2014/294279
PMID:24977146
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4058174/
Abstract

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

摘要

DNA结合蛋白对于各种细胞过程至关重要,例如识别特定核苷酸、转录调控和基因表达调控。开发一种有效的DNA结合蛋白识别模型是一个亟待解决的研究问题。到目前为止,已经提出了许多方法,但大多数方法只关注单一分类器,无法充分利用大量负样本提高预测性能。本研究提出了一种名为enDNA-Prot的预测器,通过采用集成学习技术来识别DNA结合蛋白。实验结果表明,enDNA-Prot与DNA-Prot相当,并且在ACC方面的性能提升范围为3.97-9.52%,在MCC方面的性能提升范围为0.08-0.19,优于DNAbinder和iDNA-Prot。此外,当使用负样本扩展基准数据集时,enDNA-Prot在ACC方面比三种现有方法高出2.83-16.63%,在MCC方面高出0.02-0.16。这表明enDNA-Prot是一种识别DNA结合蛋白的有效方法,使用负样本扩展训练数据集可以提高其性能。为方便广大实验科学家使用,我们为enDNA-Prot开发了一个用户友好的网络服务器,公众可免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/8150200940e5/BMRI2014-294279.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/ba3bd4a15411/BMRI2014-294279.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/a1062c022a2e/BMRI2014-294279.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/8150200940e5/BMRI2014-294279.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/ba3bd4a15411/BMRI2014-294279.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/a1062c022a2e/BMRI2014-294279.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ec1/4058174/8150200940e5/BMRI2014-294279.alg.001.jpg

相似文献

1
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot:通过应用集成学习识别DNA结合蛋白。
Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.
2
iDNA-Prot: identification of DNA binding proteins using random forest with grey model.iDNA-Prot:基于随机森林和灰色模型识别 DNA 结合蛋白。
PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.
3
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
4
Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.基于广义伪氨基酸组成和图形表示法的蛋白质序列比较及DNA结合蛋白鉴定
Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.
5
nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification.nDNA-Prot:基于不平衡分类的 DNA 结合蛋白识别。
BMC Bioinformatics. 2014 Sep 8;15(1):298. doi: 10.1186/1471-2105-15-298.
6
gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.gDNA-Prot:利用支持向量机和蛋白质序列的新型数值表征预测DNA结合蛋白。
J Theor Biol. 2016 Oct 7;406:8-16. doi: 10.1016/j.jtbi.2016.06.002. Epub 2016 Jul 1.
7
DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.DNA-Prot:利用随机森林从蛋白质序列信息中识别DNA结合蛋白。
J Biomol Struct Dyn. 2009 Jun;26(6):679-86. doi: 10.1080/07391102.2009.10507281.
8
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测
PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.
9
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.新型DNA-蛋白质:利用支持向量机和综合序列表示法预测DNA结合蛋白
Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.
10
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

引用本文的文献

1
Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。
Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.
2
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
3
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.ProkDBP:致力于更精确地识别原核 DNA 结合蛋白。

本文引用的文献

1
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.结合周氏伪氨基酸组成和基于轮廓的蛋白质表示法进行蛋白质远程同源性检测。
Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.
2
Using distances between Top-n-gram and residue pairs for protein remote homology detection.使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.
3
QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions.
Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.
4
Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.深度WET:一种基于深度学习的方法,利用带加权特征的词嵌入技术预测DNA结合蛋白。
Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.
5
Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method.使用变量选择方法鉴定植物五肽重复序列蛋白
Front Plant Sci. 2021 Mar 1;12:506681. doi: 10.3389/fpls.2021.506681. eCollection 2021.
6
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.一种基于简化氨基酸和混合特征的嗜热蛋白预测方法。
Front Bioeng Biotechnol. 2020 May 5;8:285. doi: 10.3389/fbioe.2020.00285. eCollection 2020.
7
Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers.使用各种机器学习分类器预测RNA和DNA结合蛋白
Avicenna J Med Biotechnol. 2019 Jan-Mar;11(1):104-111.
8
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods.使用混合特征提取方法鉴定植物五肽重复编码基因/蛋白质
Front Plant Sci. 2019 Jan 10;9:1961. doi: 10.3389/fpls.2018.01961. eCollection 2018.
9
Using a Classifier Fusion Strategy to Identify Anti-angiogenic Peptides.采用分类器融合策略鉴定抗血管生成肽。
Sci Rep. 2018 Sep 14;8(1):14062. doi: 10.1038/s41598-018-32443-w.
10
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers.一种通过协调多视图特征和分类器来识别DNA结合蛋白的模型堆叠框架。
Genes (Basel). 2018 Aug 1;9(8):394. doi: 10.3390/genes9080394.
QChIPat:一种用于鉴定两种不同实验条件下的两个生物 ChIP-seq 样本的独特结合模式的定量方法。
BMC Genomics. 2013;14 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2164-14-S8-S3. Epub 2013 Dec 9.
4
Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。
Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.
5
iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins.iSNO-AAPair:将氨基酸成对耦合纳入 PseAAC 中,用于预测蛋白质中的半胱氨酸 S-亚硝化位点。
PeerJ. 2013 Oct 3;1:e171. doi: 10.7717/peerj.171. eCollection 2013.
6
Hierarchical classification of protein folds using a novel ensemble classifier.利用新型集成分类器对蛋白质折叠进行层次分类。
PLoS One. 2013;8(2):e56499. doi: 10.1371/journal.pone.0056499. Epub 2013 Feb 20.
7
iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types.iAMP-2L:一种两级多标签分类器,用于识别抗菌肽及其功能类型。
Anal Biochem. 2013 May 15;436(2):168-77. doi: 10.1016/j.ab.2013.01.019. Epub 2013 Feb 6.
8
iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition.iRSpot-PseDNC:基于伪二核苷酸组成识别重组热点。
Nucleic Acids Res. 2013 Apr 1;41(6):e68. doi: 10.1093/nar/gks1450. Epub 2013 Jan 8.
9
iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties.iNuc-PhysChem:一种通过物理化学性质识别核小体的基于序列的预测工具。
PLoS One. 2012;7(10):e47843. doi: 10.1371/journal.pone.0047843. Epub 2012 Oct 29.
10
Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.