一种从大规模数据集中识别蛋白质磷酸化基序的迭代统计方法。

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets.

作者信息

Schwartz Daniel, Gygi Steven P

机构信息

Department of Cell Biology, 240 Longwood Ave., Harvard Medical School, Boston, Massachusetts 02115, USA.

出版信息

Nat Biotechnol. 2005 Nov;23(11):1391-8. doi: 10.1038/nbt1146.

DOI:10.1038/nbt1146

PMID:16273072

Abstract

With the recent exponential increase in protein phosphorylation sites identified by mass spectrometry, a unique opportunity has arisen to understand the motifs surrounding such sites. Here we present an algorithm designed to extract motifs from large data sets of naturally occurring phosphorylation sites. The methodology relies on the intrinsic alignment of phospho-residues and the extraction of motifs through iterative comparison to a dynamic statistical background. Results show the identification of dozens of novel and known phosphorylation motifs from recently published serine, threonine and tyrosine phosphorylation studies. When applied to a linguistic data set to test the versatility of the approach, the algorithm successfully extracted hundreds of language motifs. This method, in addition to shedding light on the consensus sequences of identified and as yet unidentified kinases and modular protein domains, may also eventually be used as a tool to determine potential phosphorylation sites in proteins of interest.

摘要

随着近期通过质谱法鉴定出的蛋白质磷酸化位点呈指数级增长，出现了一个独特的机会来了解此类位点周围的基序。在此，我们提出一种算法，旨在从天然存在的磷酸化位点的大数据集中提取基序。该方法依赖于磷酸化残基的内在比对以及通过与动态统计背景进行迭代比较来提取基序。结果显示，从最近发表的丝氨酸、苏氨酸和酪氨酸磷酸化研究中鉴定出了数十种新的和已知的磷酸化基序。当应用于语言数据集以测试该方法的通用性时，该算法成功提取了数百个语言基序。这种方法除了有助于揭示已鉴定和尚未鉴定的激酶以及模块化蛋白质结构域的共有序列外，最终还可能用作确定感兴趣蛋白质中潜在磷酸化位点的工具。

相似文献

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets.

Nat Biotechnol. 2005 Nov;23(11):1391-8. doi: 10.1038/nbt1146.

Computational prediction of protein-protein interactions.

Methods Mol Biol. 2004;261:445-68. doi: 10.1385/1-59259-762-9:445.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.

J Proteome Res. 2005 Nov-Dec;4(6):2338-47. doi: 10.1021/pr050264q.

Conservative extraction of over-represented extensible motifs.

Bioinformatics. 2005 Jun;21 Suppl 1:i9-18. doi: 10.1093/bioinformatics/bti1051.

Proteome informatics I: bioinformatics tools for processing experimental data.

Proteomics. 2006 Oct;6(20):5435-44. doi: 10.1002/pmic.200600273.

KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W588-94. doi: 10.1093/nar/gkm322. Epub 2007 May 21.

The SLiMDisc server: short, linear motif discovery in proteins.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W455-9. doi: 10.1093/nar/gkm400. Epub 2007 Jun 18.

KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W226-9. doi: 10.1093/nar/gki471.

Identification of substrates for Ser/Thr kinases using residue-based statistical pair potentials.

Bioinformatics. 2010 Jan 15;26(2):189-97. doi: 10.1093/bioinformatics/btp633. Epub 2009 Nov 12.

引用本文的文献

Transitions in the proteome and phospho-proteome during Xenopus laevis development.

Dev Biol. 2025 Sep;525:155-171. doi: 10.1016/j.ydbio.2025.05.022. Epub 2025 Jun 2.

Proteomics and Phosphoproteomics Revealed Dysregulated Kinases and Potential Therapy for Liver Fibrosis.

Mol Cell Proteomics. 2025 May 12;24(6):100991. doi: 10.1016/j.mcpro.2025.100991.

Integration of multi-omics data accelerates molecular analysis of common wheat traits.

Nat Commun. 2025 Mar 5;16(1):2200. doi: 10.1038/s41467-025-57550-x.

Serpina3k lactylation protects from cardiac ischemia reperfusion injury.

Nat Commun. 2025 Jan 25;16(1):1012. doi: 10.1038/s41467-024-55589-w.

Integrated proteogenomic characterization of ampullary adenocarcinoma.

Cell Discov. 2025 Jan 7;11(1):2. doi: 10.1038/s41421-024-00742-4.

Differential proteins from EVs identification based on tandem mass tags analysis and effect of Treg-derived EVs on T-lymphocytes in COPD patients.

Respir Res. 2024 Sep 28;25(1):349. doi: 10.1186/s12931-024-02980-2.

PTMoreR-enabled cross-species PTM mapping and comparative phosphoproteomics across mammals.

Cell Rep Methods. 2024 Sep 16;4(9):100859. doi: 10.1016/j.crmeth.2024.100859. Epub 2024 Sep 9.

Ketogenic diet reshapes cancer metabolism through lysine β-hydroxybutyrylation.

Nat Metab. 2024 Aug;6(8):1505-1528. doi: 10.1038/s42255-024-01093-w. Epub 2024 Aug 12.

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model.

Nat Commun. 2024 Aug 7;15(1):6699. doi: 10.1038/s41467-024-51071-9.

Decoding CPK/SnRK Superfamily Kinase Client Signaling Networks Using Peptide Library and Mass Spectrometry.

Plants (Basel). 2024 May 27;13(11):1481. doi: 10.3390/plants13111481.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种从大规模数据集中识别蛋白质磷酸化基序的迭代统计方法。

An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献