PseDNA-Pro：结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

作者信息

Liu Bin, Xu Jinghao, Fan Shixi, Xu Ruifeng, Zhou Jiyun, Wang Xiaolong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, P.R. China.

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, P.R. China.

出版信息

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

DOI:10.1002/minf.201400025

PMID:27490858

Abstract

Identification of DNA-binding proteins is an important problem in biomedical research as DNA-binding proteins are crucial for various cellular processes. Currently, the machine learning methods achieve the-state-of-the-art performance with different features. A key step to improve the performance of these methods is to find a suitable representation of proteins. In this study, we proposed a feature vector composed of three kinds of sequence-based features, including overall amino acid composition, pseudo amino acid composition (PseAAC) proposed by Chou and physicochemical distance transformation. These features not only consider the sequence composition of proteins, but also incorporate the sequence-order information of amino acids in proteins. The feature vectors were fed into Support Vector Machine (SVM) for DNA-binding protein identification. The proposed method is called PseDNA-Pro. Experiments on stringent benchmark datasets and independent test datasets by using the Jackknife test showed that PseDNA-Pro can achieve an accuracy of higher than 80 %, outperforming several state-of-the-art methods, including DNAbinder, DNA-Prot, and iDNA-Prot. These results indicate that the combination of various features for DNA-binding protein prediction is a suitable approach, and the sequence-order information among residues in proteins is relative for discrimination. For practical applications, a web-server of PseDNA-Pro was established, which is available from http://bioinformatics.hitsz.edu.cn/PseDNA-Pro/.

摘要

识别DNA结合蛋白是生物医学研究中的一个重要问题，因为DNA结合蛋白对各种细胞过程至关重要。目前，机器学习方法利用不同特征实现了最先进的性能。提高这些方法性能的关键步骤是找到一种合适的蛋白质表示方法。在本研究中，我们提出了一种由三种基于序列的特征组成的特征向量，包括整体氨基酸组成、Chou提出的伪氨基酸组成（PseAAC）和物理化学距离变换。这些特征不仅考虑了蛋白质的序列组成，还纳入了蛋白质中氨基酸的序列顺序信息。将特征向量输入支持向量机（SVM）进行DNA结合蛋白识别。所提出的方法称为PseDNA-Pro。通过留一法在严格的基准数据集和独立测试数据集上进行的实验表明，PseDNA-Pro可以达到高于80%的准确率，优于包括DNAbinder、DNA-Prot和iDNA-Prot在内的几种最先进的方法。这些结果表明，结合多种特征进行DNA结合蛋白预测是一种合适的方法，并且蛋白质中残基之间的序列顺序信息对于区分是相关的。对于实际应用，建立了PseDNA-Pro的网络服务器，可从http://bioinformatics.hitsz.edu.cn/PseDNA-Pro/获取。

相似文献

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.

Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.

J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.

Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

Mol Genet Genomics. 2015 Oct;290(5):1919-31. doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.

gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.

J Theor Biol. 2016 Oct 7;406:8-16. doi: 10.1016/j.jtbi.2016.06.002. Epub 2016 Jul 1.

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.

Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.

newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.

Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.

引用本文的文献

Benchmarking recent computational tools for DNA-binding protein identification.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.

LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning.

Front Genet. 2024 Jun 5;15:1411847. doi: 10.3389/fgene.2024.1411847. eCollection 2024.

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.

Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.

Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.

Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022.

Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.

Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.

Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.

Front Genet. 2022 Jan 28;12:821996. doi: 10.3389/fgene.2021.821996. eCollection 2021.

Application of DNA-Binding Protein Prediction Based on Graph Convolutional Network and Contact Map.

Biomed Res Int. 2022 Jan 17;2022:9044793. doi: 10.1155/2022/9044793. eCollection 2022.

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.

Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.

A sequence-based multiple kernel model for identifying DNA-binding proteins.

BMC Bioinformatics. 2021 May 31;22(Suppl 3):291. doi: 10.1186/s12859-020-03875-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PseDNA-Pro：结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献