一种用于识别磷酸化位点的支持向量机方法。

A support vector machine approach to the identification of phosphorylation sites.

作者信息

Plewczyński Dariusz, Tkacz Adrian, Godzik Adam, Rychlewski Leszek

机构信息

BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznań, Poland.

出版信息

Cell Mol Biol Lett. 2005;10(1):73-89.

PMID:15809681

Abstract

We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results.

摘要

我们描述了一种生物信息学工具，该工具可仅基于序列信息来预测蛋白质中磷酸化位点的位置。该方法采用支持向量机（SVM）统计学习理论。利用短（9个氨基酸长）序列片段数据集构建了各种激酶磷酸化的统计模型。这些序列片段是围绕当前版本的Swiss-Prot数据库中蛋白质的翻译后修饰位点进行剖析的，并且这些位点已通过实验证实可被任何激酶磷酸化。我们将它们表示为短序列片段多维抽象空间中的向量。预测方法如下。首先，将给定的查询蛋白质序列剖析为重叠的短片段。然后，通过一系列不同的表示方式将所有片段投影到序列片段的多维空间中。使用预先构建的统计模型（具有线性、多项式和径向核函数的SVM方法）将这些点分类为磷酸化或非磷酸化的点。查询蛋白质中各种激酶可能的磷酸化位点的结果列表会返回给用户。使用留一法测试估计该方法对每种磷酸化类型的效率，并在此处展示。根据激酶类型，模型的灵敏度可达到70%以上。短序列片段的轮廓表示中的附加信息有助于在某些磷酸化类型中获得更高的准确性。基于我们的算法进一步开发自动磷酸化位点注释预测器，在使用统计算法量化结果时应会有显著改进。

相似文献

A support vector machine approach to the identification of phosphorylation sites.

Cell Mol Biol Lett. 2005;10(1):73-89.

GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network.

Protein Eng Des Sel. 2007 Aug;20(8):405-12. doi: 10.1093/protein/gzm035. Epub 2007 Jul 24.

Support-vector-machine classification of linear functional motifs in proteins.

J Mol Model. 2006 Mar;12(4):453-61. doi: 10.1007/s00894-005-0070-2. Epub 2005 Dec 10.

Prediction of protein structure class by coupling improved genetic algorithm and support vector machine.

Amino Acids. 2008 Oct;35(3):581-90. doi: 10.1007/s00726-008-0084-z. Epub 2008 Apr 22.

Prediction of protein subcellular localization.

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

AutoMotif server: prediction of single residue post-translational modifications in proteins.

Bioinformatics. 2005 May 15;21(10):2525-7. doi: 10.1093/bioinformatics/bti333. Epub 2005 Feb 22.

Identification of catalytic residues from protein structure using support vector machine with sequence and structural features.

Biochem Biophys Res Commun. 2008 Mar 14;367(3):630-4. doi: 10.1016/j.bbrc.2008.01.038. Epub 2008 Jan 17.

Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.

Hum Mutat. 2008 Jan;29(1):198-204. doi: 10.1002/humu.20628.

Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition.

Protein Eng Des Sel. 2004 Jun;17(6):509-16. doi: 10.1093/protein/gzh061. Epub 2004 Aug 16.

Predicting functional sites with an automated algorithm suitable for heterogeneous datasets.

BMC Bioinformatics. 2005 May 13;6:116. doi: 10.1186/1471-2105-6-116.

引用本文的文献

Deep Learning in Phosphoproteomics: Methods and Application in Cancer Drug Discovery.

Proteomes. 2023 May 2;11(2):16. doi: 10.3390/proteomes11020016.

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

Methods Mol Biol. 2022;2499:285-322. doi: 10.1007/978-1-0716-2317-6_15.

Predicting proteome dynamics using gene expression data.

Sci Rep. 2018 Sep 14;8(1):13866. doi: 10.1038/s41598-018-31752-4.

Phosphorylation variation during the cell cycle scales with structural propensities of proteins.

PLoS Comput Biol. 2013;9(1):e1002842. doi: 10.1371/journal.pcbi.1002842. Epub 2013 Jan 10.

PhosTryp: a phosphorylation site predictor specific for parasitic protozoa of the family trypanosomatidae.

BMC Genomics. 2011 Dec 19;12:614. doi: 10.1186/1471-2164-12-614.

Computational prediction of type III and IV secreted effectors in gram-negative bacteria.

Infect Immun. 2011 Jan;79(1):23-32. doi: 10.1128/IAI.00537-10. Epub 2010 Oct 25.

Phospho3D 2.0: an enhanced database of three-dimensional structures of phosphorylation sites.

Nucleic Acids Res. 2011 Jan;39(Database issue):D268-71. doi: 10.1093/nar/gkq936. Epub 2010 Oct 21.

Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines.

Bioinform Biol Insights. 2009 Nov 24;1:19-47. doi: 10.4137/bbi.s315.

Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems.

PLoS Pathog. 2009 Apr;5(4):e1000375. doi: 10.1371/journal.ppat.1000375. Epub 2009 Apr 24.

Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins.

BMC Bioinformatics. 2009 Apr 21;10:117. doi: 10.1186/1471-2105-10-117.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于识别磷酸化位点的支持向量机方法。

A support vector machine approach to the identification of phosphorylation sites.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献