Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA.
J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8.
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
本文全面回顾了使用串联质谱 (MS/MS) 数据在鸟枪法蛋白质组实验中进行肽和蛋白质鉴定的过程。从基本策略到高级多阶段方法,对用于将肽序列分配给 MS/MS 谱的常用方法进行了批判性讨论和比较。特别关注假阳性鉴定问题。调查了评估肽与谱匹配显著性的现有统计方法,范围从单谱方法(如期望值)到全局错误率估计程序(如假发现率和后验概率)。讨论了使用辅助判别信息(质量准确度、肽分离坐标、消化特性等)的重要性,并提出了联合建模多个信息源的先进计算方法。本文还详细分析了影响蛋白质水平数据解释的问题,包括从肽到蛋白质水平时错误率的放大,以及在存在共享肽时推断样品蛋白质身份的歧义。详细讨论了计算蛋白质水平置信得分的常用方法。本文以讨论几个突出的计算问题结束。