通过使用质谱数据搜索序列数据库进行基于概率的蛋白质鉴定。

Probability-based protein identification by searching sequence databases using mass spectrometry data.

作者信息

Perkins D N, Pappin D J, Creasy D M, Cottrell J S

机构信息

Imperial Cancer Research Fund, London, UK.

出版信息

Electrophoresis. 1999 Dec;20(18):3551-67. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.

DOI:10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2

PMID:10612281

Abstract

Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

摘要

文献中已描述了几种通过使用质谱数据搜索序列数据库来鉴定蛋白质的算法。在一些方法中，实验数据是蛋白质经酶消化后的肽分子量。其他方法使用来自一个或多个肽的串联质谱（MS/MS）数据。还有一些方法将质量数据与氨基酸序列数据相结合。我们展示了一个新的计算机程序 Mascot 的结果，该程序整合了所有三种搜索类型。评分算法基于概率，具有许多优点：（i）可以使用一个简单规则来判断结果是否显著。这在防止假阳性方面特别有用。（ii）分数可以与其他类型搜索（如序列同源性搜索）的分数进行比较。（iii）搜索参数可以通过迭代轻松优化。本文讨论了基于概率评分的优点和局限性，特别是在高通量、全自动蛋白质鉴定的背景下。

相似文献

Probability-based protein identification by searching sequence databases using mass spectrometry data.

Electrophoresis. 1999 Dec;20(18):3551-67. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.

Probability-based validation of protein identifications using a modified SEQUEST algorithm.

Anal Chem. 2002 Nov 1;74(21):5593-9. doi: 10.1021/ac025826t.

Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy.

Mol Cell Proteomics. 2007 Sep;6(9):1599-608. doi: 10.1074/mcp.M600469-MCP200. Epub 2007 May 28.

A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data.

BMC Bioinformatics. 2007 Apr 20;8:133. doi: 10.1186/1471-2105-8-133.

Qscore: an algorithm for evaluating SEQUEST database search results.

J Am Soc Mass Spectrom. 2002 Apr;13(4):378-86. doi: 10.1016/S1044-0305(02)00352-5.

A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.

Anal Chem. 2003 Aug 1;75(15):3792-8. doi: 10.1021/ac034157w.

Identification of proteins from non-model organisms using mass spectrometry: application to a hibernating mammal.

J Proteome Res. 2006 Apr;5(4):829-39. doi: 10.1021/pr050306a.

Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.

Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725.

A Heuristic method for assigning a false-discovery rate for protein identifications from Mascot database search results.

Mol Cell Proteomics. 2005 Jun;4(6):762-72. doi: 10.1074/mcp.M400215-MCP200. Epub 2005 Feb 9.

A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications.

Bioinformatics. 2003 Oct;19 Suppl 2:ii113-21. doi: 10.1093/bioinformatics/btg1068.

引用本文的文献

Algae-dominated metaproteomes uncover cellular adaptations to life on the Greenland Ice Sheet.

NPJ Biofilms Microbiomes. 2025 Sep 9;11(1):181. doi: 10.1038/s41522-025-00770-2.

Genetic parallels in biomineralization of the calcareous sponge and stony corals.

Elife. 2025 Sep 9;14:RP106239. doi: 10.7554/eLife.106239.

Acclimation of Synechocystis sp. PCC 6803 to Alkaline pH Under Ambient Air.

Physiol Plant. 2025 Sep-Oct;177(5):e70474. doi: 10.1111/ppl.70474.

QSample: An Automated System for Rapid Monitoring of Quality Indicators in Proteomics Samples.

J Proteome Res. 2025 Sep 5;24(9):4816-4824. doi: 10.1021/acs.jproteome.5c00119. Epub 2025 Aug 19.

Anion exchange chromatography-based purification of plant-derived nanovesicles from L.: molecular profiling and bioactivity in human cells.

Front Bioeng Biotechnol. 2025 Jul 31;13:1617478. doi: 10.3389/fbioe.2025.1617478. eCollection 2025.

High-resolution Cryo-EM Analysis of the Therapeutic Pseudomonas Phage Pa223.

J Mol Biol. 2025 Aug 12;437(21):169386. doi: 10.1016/j.jmb.2025.169386.

The Emerging Role of Omics-Based Approaches in Plant Virology.

Viruses. 2025 Jul 15;17(7):986. doi: 10.3390/v17070986.

Reconstructing medieval diets through the integration of stable isotope and proteomic analyses from two European burial sites.

Sci Rep. 2025 Jul 21;15(1):26442. doi: 10.1038/s41598-025-10103-0.

Biological Function Assignment across Taxonomic Levels in Mass-Spectrometry-Based Metaproteomics via a Modified Expectation Maximization Algorithm.

J Proteome Res. 2025 Aug 1;24(8):3818-3832. doi: 10.1021/acs.jproteome.4c01125. Epub 2025 Jul 18.

Biological Function Assignment Across Taxonomic Levels in Mass-Spectrometry-Based Metaproteomics via a Modified Expectation Maximization Algorithm.

bioRxiv. 2025 Jun 17:2025.06.12.659309. doi: 10.1101/2025.06.12.659309.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过使用质谱数据搜索序列数据库进行基于概率的蛋白质鉴定。

Probability-based protein identification by searching sequence databases using mass spectrometry data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献