Wu Xue, Tseng Chau-Wen, Edwards Nathan
Department of Computer Science, University of Maryland, College Park, MD 20742, USA.
J Comput Biol. 2007 Oct;14(8):1025-43. doi: 10.1089/cmb.2007.0071.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.
通过串联质谱进行肽段鉴定是复杂样品中蛋白质表征的主要蛋白质组学工作流程。这些工作流程产生的肽段碎裂谱呈现出可用于鉴定肽段的特征性碎裂模式。在其他领域,目标化合物不具有肽段那样方便的线性结构,通过将新谱图与已鉴定谱图的库进行比较来鉴定碎裂谱,这种方法称为谱图匹配。与用于肽段的基于序列的串联质谱搜索引擎不同,谱图匹配可以利用库谱图中碎片峰的强度来评估匹配质量。我们评估了一种用于谱图匹配的隐马尔可夫模型方法(HMMatch),其中肽段碎裂谱的许多示例被总结在一个生成概率模型中,该模型捕获每个峰强度的一致性和变化。我们证明,与诸如X!Tandem等序列数据库搜索引擎相比,HMMatch具有良好的特异性和卓越的灵敏度。HMMatch从相对较少的训练谱图中就能取得良好结果,训练速度快,并且每秒可以评估许多谱图。一个统计显著性模型允许在统一尺度上比较HMMatch分数以及与其他肽段鉴定工具的分数。HMMatch与X!Tandem、Mascot和NIST的MS Search显示出相似程度的一致性,就像它们彼此之间的一致性一样,这表明每个工具都可以将肽段分配到其他工具遗漏的谱图中。最后,我们表明可以将HMMatch模型从单个肽段的训练谱图外推到相关肽段的谱图,从而将谱图匹配技术的应用扩展到先前观察到的肽段集合之外。