Suppr超能文献

PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

作者信息

Liu Bin, Xu Jinghao, Fan Shixi, Xu Ruifeng, Zhou Jiyun, Wang Xiaolong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, P.R. China.

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, P.R. China.

出版信息

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

Abstract

Identification of DNA-binding proteins is an important problem in biomedical research as DNA-binding proteins are crucial for various cellular processes. Currently, the machine learning methods achieve the-state-of-the-art performance with different features. A key step to improve the performance of these methods is to find a suitable representation of proteins. In this study, we proposed a feature vector composed of three kinds of sequence-based features, including overall amino acid composition, pseudo amino acid composition (PseAAC) proposed by Chou and physicochemical distance transformation. These features not only consider the sequence composition of proteins, but also incorporate the sequence-order information of amino acids in proteins. The feature vectors were fed into Support Vector Machine (SVM) for DNA-binding protein identification. The proposed method is called PseDNA-Pro. Experiments on stringent benchmark datasets and independent test datasets by using the Jackknife test showed that PseDNA-Pro can achieve an accuracy of higher than 80 %, outperforming several state-of-the-art methods, including DNAbinder, DNA-Prot, and iDNA-Prot. These results indicate that the combination of various features for DNA-binding protein prediction is a suitable approach, and the sequence-order information among residues in proteins is relative for discrimination. For practical applications, a web-server of PseDNA-Pro was established, which is available from http://bioinformatics.hitsz.edu.cn/PseDNA-Pro/.

摘要

识别DNA结合蛋白是生物医学研究中的一个重要问题,因为DNA结合蛋白对各种细胞过程至关重要。目前,机器学习方法利用不同特征实现了最先进的性能。提高这些方法性能的关键步骤是找到一种合适的蛋白质表示方法。在本研究中,我们提出了一种由三种基于序列的特征组成的特征向量,包括整体氨基酸组成、Chou提出的伪氨基酸组成(PseAAC)和物理化学距离变换。这些特征不仅考虑了蛋白质的序列组成,还纳入了蛋白质中氨基酸的序列顺序信息。将特征向量输入支持向量机(SVM)进行DNA结合蛋白识别。所提出的方法称为PseDNA-Pro。通过留一法在严格的基准数据集和独立测试数据集上进行的实验表明,PseDNA-Pro可以达到高于80%的准确率,优于包括DNAbinder、DNA-Prot和iDNA-Prot在内的几种最先进的方法。这些结果表明,结合多种特征进行DNA结合蛋白预测是一种合适的方法,并且蛋白质中残基之间的序列顺序信息对于区分是相关的。对于实际应用,建立了PseDNA-Pro的网络服务器,可从http://bioinformatics.hitsz.edu.cn/PseDNA-Pro/获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验