用于蛋白质分类的简单无比对方法：以G蛋白偶联受体为例的研究

Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors.

作者信息

Strope Pooja K, Moriyama Etsuko N

机构信息

Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0660,

出版信息

Genomics. 2007 May;89(5):602-12. doi: 10.1016/j.ygeno.2007.01.008. Epub 2007 Mar 2.

DOI:10.1016/j.ygeno.2007.01.008

PMID:17336495

Abstract

Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.

摘要

预测蛋白质功能的计算方法依赖于检测蛋白质之间的相似性。然而，对于某些蛋白质家族来说，并非总能获得足够的序列信息。例如，感兴趣的蛋白质可能是一个分化蛋白质家族的新成员。在这种具有挑战性的情况下，蛋白质分类方法的性能可能会有所不同。以G蛋白偶联受体超家族为例，我们研究了几种蛋白质分类器的性能。基于支持向量机并使用简单氨基酸组成的无比对分类器，即使从短片段序列中也能有效地进行远程相似性检测。尽管计算成本很高，但使用局部两两比对得分的支持向量机分类器表现出非常好的平衡性能。更常用的轮廓隐马尔可夫模型通常具有高度特异性，非常适合对已确立的蛋白质家族成员进行分类。建议应应用不同类型的蛋白质分类器以获得最佳挖掘能力。

相似文献

Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors.

Genomics. 2007 May;89(5):602-12. doi: 10.1016/j.ygeno.2007.01.008. Epub 2007 Mar 2.

Protein family classification with partial least squares.

J Proteome Res. 2007 Feb;6(2):846-53. doi: 10.1021/pr060534k.

Protein classification based on text document classification techniques.

Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Proteomic applications of automated GPCR classification.

Proteomics. 2007 Aug;7(16):2800-14. doi: 10.1002/pmic.200700093.

Classifying G-protein coupled receptors with bagging classification tree.

Comput Biol Chem. 2004 Oct;28(4):275-80. doi: 10.1016/j.compbiolchem.2004.08.001.

Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform.

Anal Biochem. 2009 Jul 1;390(1):68-73. doi: 10.1016/j.ab.2009.04.009. Epub 2009 Apr 11.

HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.

BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.

A novel and efficient technique for identification and classification of GPCRs.

IEEE Trans Inf Technol Biomed. 2008 Jul;12(4):541-8. doi: 10.1109/TITB.2007.911308.

GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble.

Amino Acids. 2012 May;42(5):1809-23. doi: 10.1007/s00726-011-0902-6. Epub 2011 Apr 20.

引用本文的文献

DeepFam: deep learning based alignment-free method for protein family modeling and prediction.

Bioinformatics. 2018 Jul 1;34(13):i254-i262. doi: 10.1093/bioinformatics/bty275.

Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone.

BMC Bioinformatics. 2017 Jul 21;18(1):349. doi: 10.1186/s12859-017-1758-x.

A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.

J Comput Biol. 2014 Dec;21(12):947-63. doi: 10.1089/cmb.2014.0173.

Exploring the adenylation domain repertoire of nonribosomal peptide synthetases using an ensemble of sequence-search methods.

PLoS One. 2013 Jul 16;8(7):e65926. doi: 10.1371/journal.pone.0065926. Print 2013.

Identification of novel arthropod vector G protein-coupled receptors.

Parasit Vectors. 2013 May 24;6:150. doi: 10.1186/1756-3305-6-150.

Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

Appl Environ Microbiol. 2013 Jun;79(11):3380-91. doi: 10.1128/AEM.03803-12. Epub 2013 Mar 22.

Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. using the relative complexity measure.

BMC Bioinformatics. 2013 Jan 17;14:20. doi: 10.1186/1471-2105-14-20.

The repertoire of G protein-coupled receptors in the human parasite Schistosoma mansoni and the model organism Schmidtea mediterranea.

BMC Genomics. 2011 Dec 6;12:596. doi: 10.1186/1471-2164-12-596.

An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference.

PLoS One. 2011;6(10):e26638. doi: 10.1371/journal.pone.0026638. Epub 2011 Oct 26.

Mining Cytochrome b561 proteins from plant genomes.

Int J Bioinform Res Appl. 2010;6(2):209-21. doi: 10.1504/IJBRA.2010.032122.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于蛋白质分类的简单无比对方法：以G蛋白偶联受体为例的研究

Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献