用于在鸟枪法蛋白质组学中鉴定肽和蛋白质的计算方法和错误率估计程序的调查。

A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

机构信息

Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8.

DOI:10.1016/j.jprot.2010.08.009

PMID:20816881

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2956504/

Abstract

This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.

摘要

本文全面回顾了使用串联质谱 (MS/MS) 数据在鸟枪法蛋白质组实验中进行肽和蛋白质鉴定的过程。从基本策略到高级多阶段方法，对用于将肽序列分配给 MS/MS 谱的常用方法进行了批判性讨论和比较。特别关注假阳性鉴定问题。调查了评估肽与谱匹配显著性的现有统计方法，范围从单谱方法（如期望值）到全局错误率估计程序（如假发现率和后验概率）。讨论了使用辅助判别信息（质量准确度、肽分离坐标、消化特性等）的重要性，并提出了联合建模多个信息源的先进计算方法。本文还详细分析了影响蛋白质水平数据解释的问题，包括从肽到蛋白质水平时错误率的放大，以及在存在共享肽时推断样品蛋白质身份的歧义。详细讨论了计算蛋白质水平置信得分的常用方法。本文以讨论几个突出的计算问题结束。

相似文献

A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8.

Calculation of False Discovery Rate for Peptide and Protein Identification.

Methods Mol Biol. 2020;2051:145-159. doi: 10.1007/978-1-4939-9744-2_6.

Potential for false positive identifications from large databases through tandem mass spectrometry.

J Proteome Res. 2004 Sep-Oct;3(5):1082-5. doi: 10.1021/pr049946o.

Interpretation of shotgun proteomic data: the protein inference problem.

Mol Cell Proteomics. 2005 Oct;4(10):1419-40. doi: 10.1074/mcp.R500012-MCP200. Epub 2005 Jul 11.

Elective affinities--bioinformatic analysis of proteomic mass spectrometry data.

Arch Physiol Biochem. 2009 Dec;115(5):311-9. doi: 10.3109/13813450903390039.

Generalized method for probability-based peptide and protein identification from tandem mass spectrometry data and sequence database searching.

Mol Cell Proteomics. 2008 Sep;7(9):1748-54. doi: 10.1074/mcp.M800122-MCP200. Epub 2008 May 31.

Current algorithmic solutions for peptide-based proteomics data generation and identification.

Curr Opin Biotechnol. 2013 Feb;24(1):31-8. doi: 10.1016/j.copbio.2012.10.013. Epub 2012 Nov 8.

Quality assessments of peptide-spectrum matches in shotgun proteomics.

Proteomics. 2011 Mar;11(6):1086-93. doi: 10.1002/pmic.201000432. Epub 2011 Feb 7.

Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments.

Mol Cell Proteomics. 2013 Nov;12(11):3420-30. doi: 10.1074/mcp.M113.029165. Epub 2013 Aug 1.

Assigning spectrum-specific P-values to protein identifications by mass spectrometry.

Bioinformatics. 2011 Apr 15;27(8):1128-34. doi: 10.1093/bioinformatics/btr089. Epub 2011 Feb 23.

引用本文的文献

Grape-Pi: graph-based neural networks for enhanced protein identification in proteomics pipelines.

Bioinform Adv. 2025 Apr 26;5(1):vbaf095. doi: 10.1093/bioadv/vbaf095. eCollection 2025.

To Fly, or Not to Fly, That Is the Question: A Deep Learning Model for Peptide Detectability Prediction in Mass Spectrometry.

J Proteome Res. 2025 Jun 6;24(6):2709-2726. doi: 10.1021/acs.jproteome.4c00973. Epub 2025 May 9.

PepCentric Enables Fast Repository-Scale Proteogenomics Searches.

bioRxiv. 2025 Feb 28:2025.02.24.639867. doi: 10.1101/2025.02.24.639867.

ProtGraph: a tool for the quick and comprehensive exploration and exploitation of the peptide search space derived from protein sequence databases using graphs.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae671.

TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics.

J Proteome Res. 2025 Jan 3;24(1):55-64. doi: 10.1021/acs.jproteome.4c00293. Epub 2024 Dec 6.

Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.

PLoS Comput Biol. 2024 Nov 11;20(11):e1012575. doi: 10.1371/journal.pcbi.1012575. eCollection 2024 Nov.

Pollen-Food Allergy Syndrome: From Food Avoidance to Deciphering the Potential Cross-Reactivity between Pru p 3 and Ole e 7.

Nutrients. 2024 Aug 27;16(17):2869. doi: 10.3390/nu16172869.

Attracting Computational Researchers to Proteomics.

J Am Soc Mass Spectrom. 2024 Oct 2;35(10):2544-2546. doi: 10.1021/jasms.4c00185. Epub 2024 Aug 30.

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

A learned score function improves the power of mass spectrometry database search.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i410-i417. doi: 10.1093/bioinformatics/btae218.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Optimization and testing of mass spectral library search algorithms for compound identification.

J Am Soc Mass Spectrom. 1994 Sep;5(9):859-66. doi: 10.1016/1044-0305(94)87009-8.

Yeast proteomics and protein microarrays.

J Proteomics. 2010 Oct 10;73(11):2147-57. doi: 10.1016/j.jprot.2010.08.003. Epub 2010 Aug 20.

Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data.

J Proteome Res. 2010 Oct 1;9(10):5346-57. doi: 10.1021/pr100594k.

PeptideClassifier for protein inference and targeted quantitative proteomics.

Nat Biotechnol. 2010 Jul;28(7):647-50. doi: 10.1038/nbt0710-647.

Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research.

J Proteomics. 2010 Oct 10;73(11):2136-46. doi: 10.1016/j.jprot.2010.06.008. Epub 2010 Jul 6.

Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies.

J Proteome Res. 2010 Aug 6;9(8):4152-60. doi: 10.1021/pr1003856.

MassSieve: panning MS/MS peptide data for proteins.

Proteomics. 2010 Aug;10(16):3035-9. doi: 10.1002/pmic.200900370.

Protein and gene model inference based on statistical modeling in k-partite graphs.

Proc Natl Acad Sci U S A. 2010 Jul 6;107(27):12101-6. doi: 10.1073/pnas.0907654107. Epub 2010 Jun 18.

Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate.

Bioinformatics. 2010 Jun 15;26(12):i399-406. doi: 10.1093/bioinformatics/btq185.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于在鸟枪法蛋白质组学中鉴定肽和蛋白质的计算方法和错误率估计程序的调查。

A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献