State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, P R China.
Mol Cell Proteomics. 2011 Dec;10(12):M110.005785. doi: 10.1074/mcp.M110.005785. Epub 2011 Jul 20.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.
肽质量指纹图谱,无论在蛋白质鉴定方面是否与串联质谱互补,仍然是深入研究的主题,因为与串联质谱相比,它具有更高的样品通量、对单个肽的特异性更高,对意外翻译后修饰的敏感性更低。在本研究中,我们提出、实现和评估了一种使用支持向量机的统一方法,以纳入用于准确 PMF 的各个概念和结论。我们专注于理论谱(肽)、实验谱(峰)和谱(质量)对齐的固有属性和关键问题。从理论肽的裂解类型、独特性和可变质量以及实验峰的强度等级中得出 81 种特征匹配模式,用于描述肽质量指纹图谱程序的匹配情况。我们开发了一种新策略,包括参与匹配峰强度再分配以处理共享峰强度,并且生成了 440 个参数来数字化每个特征匹配模式。最终,通过最优多标准支持向量机方法,对 137 项的评估数据集实现了高性能,从 35640 个归一化特征的特征向量中生成了 491 个最终特征,通过交叉训练和验证一个公开的“黄金标准”肽质量指纹图谱数据集(1733 项)。与 Mascot、MS-Fit、ProFound 和 Aldente 等常用于基于 MS 的蛋白质鉴定的算法相比,特征匹配模式算法具有更强的能力,可以更清楚地分离正确鉴定和随机匹配,其蛋白质鉴定的灵敏度(82%)、精度(97%)和 F1 度量(89%)最高。通过这项研究得出的几个结论对基于 MS 的蛋白质鉴定做出了一般性贡献。首先,固有属性显示出与其他属性相当或更高的稳健性。作为实验谱的固有属性,在蛋白质鉴定过程中应高度关注峰强度。其次,在阳性肽质量指纹图谱中,强烈的实验峰与适当消化、独特或非修饰的理论肽之间的对齐非常可能发生。最后,通过多种类型的调和因子(包括错过的裂解和质量修饰)进行归一化可以对程序的性能做出重要贡献。