通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.

作者信息

Liu Bin, Wang Shanyi, Wang Xiaolong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.

出版信息

Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.

DOI:10.1038/srep15479

PMID:26482832

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4611492/

Abstract

DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

摘要

DNA结合蛋白在大多数细胞过程中发挥着重要作用。因此，有必要开发一种仅基于蛋白质序列信息来识别DNA结合蛋白的高效预测器。构建一个有用的预测器的瓶颈在于找到能够捕捉DNA结合蛋白特征的合适特征。我们将伪氨基酸组成（PseAAC）应用于DNA结合蛋白识别，并通过使用基于轮廓的蛋白质表示纳入进化信息来进一步改进PseAAC。最后，结合支持向量机（SVM），提出了一种名为iDNAPro-PseAAC的预测器。在一个更新的基准数据集上的实验结果表明，iDNAPro-PseAAC优于一些现有方法，并且在独立数据集上能够实现稳定的性能。通过在训练过程中使用集成学习方法纳入更多负样本（非DNA结合蛋白），iDNAPro-PseAAC的性能得到了进一步提高。iDNAPro-PseAAC的网络服务器可在http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5686/4611492/866e13192f81/srep15479-f1.jpg

相似文献

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.

Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.

Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.

J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation.

Int J Mol Sci. 2017 Aug 25;18(9):1856. doi: 10.3390/ijms18091856.

Identify DNA-binding proteins with optimal Chou's amino acid composition.

Protein Pept Lett. 2012 Apr;19(4):398-405. doi: 10.2174/092986612799789404.

iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions.

Sci Rep. 2016 Jan 12;6:19062. doi: 10.1038/srep19062.

iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC.

J Theor Biol. 2019 Jan 7;460:195-203. doi: 10.1016/j.jtbi.2018.10.021. Epub 2018 Oct 9.

iCataly-PseAAC: Identification of Enzymes Catalytic Sites Using Sequence Evolution Information with Grey Model GM (2,1).

J Membr Biol. 2015 Dec;248(6):1033-41. doi: 10.1007/s00232-015-9815-8. Epub 2015 Jun 16.

引用本文的文献

DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.

NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.

Systematic discovery of DNA-binding tandem repeat proteins.

Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710.

LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning.

Front Genet. 2024 Jun 5;15:1411847. doi: 10.3389/fgene.2024.1411847. eCollection 2024.

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.

Protein feature engineering framework for AMPylation site prediction.

Sci Rep. 2024 Apr 15;14(1):8695. doi: 10.1038/s41598-024-58450-8.

HormoNet: a deep learning approach for hormone-drug interaction prediction.

BMC Bioinformatics. 2024 Feb 28;25(1):87. doi: 10.1186/s12859-024-05708-7.

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.

Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.

Hybrid_DBP: Prediction of DNA-binding proteins using hybrid features and convolutional neural networks.

Front Pharmacol. 2022 Oct 10;13:1031759. doi: 10.3389/fphar.2022.1031759. eCollection 2022.

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.

Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022.

Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.

Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.

本文引用的文献

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):192-201. doi: 10.1109/TCBB.2013.146.

Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods.

Biomed Res Int. 2015;2015:810514. doi: 10.1155/2015/810514. Epub 2015 Jul 26.

Application of learning to rank to protein remote homology detection.

Bioinformatics. 2015 Nov 1;31(21):3492-8. doi: 10.1093/bioinformatics/btv413. Epub 2015 Jul 10.

Brief Funct Genomics. 2016 Jan;15(1):55-64. doi: 10.1093/bfgp/elv024. Epub 2015 Jul 1.

Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks.

Brief Bioinform. 2016 Mar;17(2):193-203. doi: 10.1093/bib/bbv033. Epub 2015 Jun 9.

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.

Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.

Crystallization and preliminary X-ray characterization of the eukaryotic replication terminator Reb1-Ter DNA complex.

Acta Crystallogr F Struct Biol Commun. 2015 Apr;71(Pt 4):414-8. doi: 10.1107/S2053230X15004112. Epub 2015 Mar 20.

Identification of real microRNA precursors with a pseudo structure status composition approach.

PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.

HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy.

Bioinformatics. 2015 Aug 1;31(15):2475-81. doi: 10.1093/bioinformatics/btv177. Epub 2015 Mar 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献