Huang Yingying, Tseng George C, Yuan Shinsheng, Pasa-Tolic Ljiljana, Lipton Mary S, Smith Richard D, Wysocki Vicki H
Department of Chemistry, University of Arizona, Tucson, Arizona 85721, USA.
J Proteome Res. 2008 Jan;7(1):70-9. doi: 10.1021/pr070106u. Epub 2007 Dec 4.
Although tandem mass spectrometry (MS/MS) has become an integral part of proteomics, intensity patterns in MS/MS spectra are rarely weighted heavily in most widely used algorithms because they are not yet fully understood. Here a knowledge mining approach is demonstrated to discover fragmentation intensity patterns and elucidate the chemical factors behind such patterns. Fragmentation intensity information from 28 330 ion trap peptide MS/MS spectra of different charge states and sequences went through unsupervised clustering using a penalized K-means algorithm. Without any prior chemistry assumptions, four clusters with distinctive fragmentation patterns were obtained. A decision tree was generated to investigate peptide sequence motif and charge state status that caused these fragmentation patterns. This data-mining scheme is generally applicable for any large data sets. It bypasses the common prior knowledge constraints and reports on the overall peptide fragmentation behavior. It improves the understanding of gas-phase peptide dissociation and provides a foundation for new or improved protein identification algorithms.
尽管串联质谱法(MS/MS)已成为蛋白质组学不可或缺的一部分,但在大多数广泛使用的算法中,MS/MS 谱图中的强度模式很少被重点考虑,因为它们尚未被完全理解。本文展示了一种知识挖掘方法,用于发现碎片强度模式并阐明这些模式背后的化学因素。来自 28330 个不同电荷状态和序列的离子阱肽 MS/MS 谱图的碎片强度信息,使用惩罚 K 均值算法进行了无监督聚类。在没有任何先验化学假设的情况下,获得了具有独特碎片模式的四个聚类。生成了一个决策树来研究导致这些碎片模式的肽序列基序和电荷状态情况。这种数据挖掘方案通常适用于任何大型数据集。它绕过了常见的先验知识限制,并报告了整体肽碎片行为。它增进了对气相肽解离的理解,并为新的或改进的蛋白质鉴定算法提供了基础。