结合周氏距离对伪氨基酸组成和主成分分析进行蛋白质远程同源性检测。

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

作者信息

Liu Bin, Chen Junjie, Wang Xiaolong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, Guangdong, People's Republic of China.

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, Guangdong, People's Republic of China.

出版信息

Mol Genet Genomics. 2015 Oct;290(5):1919-31. doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.

DOI:10.1007/s00438-015-1044-4

PMID:25896721

Abstract

Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou's pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods.

摘要

蛋白质远程同源性检测是计算蛋白质组学中的重要任务之一，对基础研究和实际应用都很重要。目前，基于支持向量机的判别方法已表现出卓越性能。然而，现有的特征向量仍无法恰当地表示蛋白质序列，且往往缺乏用于特征分析的可解释模型。先前的研究表明，序列顺序效应和物理化学性质对表示蛋白质序列很重要。然而，如何利用这类信息构建预测器仍是一个具有挑战性的问题。在本研究中，为了将序列顺序信息和物理化学性质纳入预测，提出了一种名为disPseAAC的方法，其中特征向量是通过在周的伪氨基酸组成（PseAAC）方法中结合氨基酸对的出现情况来构建的。通过采用主成分分析策略，进一步提高了预测性能和计算成本。在一个基准数据集上进行了各种实验。实验结果表明，disPseAAC的ROC得分为0.922，优于一些现有的先进方法。此外，所学习的模型可以很容易地根据判别特征进行分析，并且所提出方法的计算成本远低于其他基于轮廓的方法。

相似文献

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

Mol Genet Genomics. 2015 Oct;290(5):1919-31. doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.

Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.

Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.

Using amino acid physicochemical distance transformation for fast protein remote homology detection.

PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.

J Theor Biol. 2018 Jan 21;437:239-250. doi: 10.1016/j.jtbi.2017.10.030. Epub 2017 Oct 31.

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.

J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.

Prediction of protein structural class for low-similarity sequences using Chou's pseudo amino acid composition and wavelet denoising.

J Mol Graph Model. 2017 Sep;76:260-273. doi: 10.1016/j.jmgm.2017.07.012. Epub 2017 Jul 14.

Predict protein structural class by incorporating two different modes of evolutionary information into Chou's general pseudo amino acid composition.

J Mol Graph Model. 2017 Nov;78:110-117. doi: 10.1016/j.jmgm.2017.10.003. Epub 2017 Oct 7.

Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

Amino Acids. 2009 Jul;37(2):415-25. doi: 10.1007/s00726-008-0170-2. Epub 2008 Aug 23.

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.

Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.

引用本文的文献

Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis.

Sci Rep. 2019 Nov 15;9(1):16932. doi: 10.1038/s41598-019-53324-w.

Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods.

Front Plant Sci. 2019 Jan 10;9:1961. doi: 10.3389/fpls.2018.01961. eCollection 2018.

Protein remote homology detection based on bidirectional long short-term memory.

BMC Bioinformatics. 2017 Oct 10;18(1):443. doi: 10.1186/s12859-017-1842-2.

Investigation of the inhibition effect and mechanism of myricetin to Suilysin by molecular modeling.

Sci Rep. 2017 Sep 18;7(1):11748. doi: 10.1038/s41598-017-12168-y.

IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types.

Int J Mol Sci. 2017 Aug 24;18(9):1838. doi: 10.3390/ijms18091838.

Prefiltering Model for Homology Detection Algorithms on GPU.

Evol Bioinform Online. 2016 Dec 18;12:313-322. doi: 10.4137/EBO.S40877. eCollection 2016.

Prediction of phosphothreonine sites in human proteins by fusing different features.

Sci Rep. 2016 Oct 4;6:34817. doi: 10.1038/srep34817.

iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance.

Sci Rep. 2016 Sep 19;6:33483. doi: 10.1038/srep33483.

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation.

Sci Rep. 2016 Sep 1;6:32333. doi: 10.1038/srep32333.

Protein Remote Homology Detection Based on an Ensemble Learning Approach.

Biomed Res Int. 2016;2016:5813645. doi: 10.1155/2016/5813645. Epub 2016 May 8.

本文引用的文献

PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.

Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.

Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.

Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.

Identification of real microRNA precursors with a pseudo structure status composition approach.

PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.

miRNA-dis: microRNA precursor identification based on distance structure status pairs.

Mol Biosyst. 2015 Apr;11(4):1194-204. doi: 10.1039/c5mb00050e.

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach.

J Biomol Struct Dyn. 2016;34(1):223-35. doi: 10.1080/07391102.2015.1014422. Epub 2015 Mar 3.

Impacts of bioinformatics to medicinal chemistry.

Med Chem. 2015;11(3):218-34. doi: 10.2174/1573406411666141229162834.

repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.

Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.

Improved performance of sequence search approaches in remote homology detection.

F1000Res. 2013 Mar 22;2:93. doi: 10.12688/f1000research.2-93.v2. eCollection 2013.

Molecular science for drug development and biomedicine.

Int J Mol Sci. 2014 Nov 4;15(11):20072-8. doi: 10.3390/ijms151120072.

iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.

Nucleic Acids Res. 2014 Dec 1;42(21):12961-72. doi: 10.1093/nar/gku1019. Epub 2014 Oct 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

结合周氏距离对伪氨基酸组成和主成分分析进行蛋白质远程同源性检测。

Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献