Tabb David L, Thompson Melissa R, Khalsa-Moyers Gurusahai, VerBerkmoes Nathan C, McDonald W Hayes
Life Sciences Division, Oak Ridge Laboratory, Oak Ridge, Tennessee 37831, USA.
J Am Soc Mass Spectrom. 2005 Aug;16(8):1250-61. doi: 10.1016/j.jasms.2005.04.010.
Shotgun proteomics experiments require the collection of thousands of tandem mass spectra; these sets of data will continue to grow as new instruments become available that can scan at even higher rates. Such data contain substantial amounts of redundancy with spectra from a particular peptide being acquired many times during a single LC-MS/MS experiment. In this article, we present MS2Grouper, an algorithm that detects spectral duplication, assesses groups of related spectra, and replaces these groups with synthetic representative spectra. Errors in detecting spectral similarity are corrected using a paraclique criterion-spectra are only assessed as groups if they are part of a clique of at least three completely interrelated spectra or are subsequently added to such cliques by being similar to all but one of the clique members. A greedy algorithm constructs a representative spectrum for each group by iteratively removing the tallest peaks from the spectral collection and matching to peaks in the other spectra. This strategy is shown to be effective in reducing spectral counts by up to 20% in LC-MS/MS datasets from protein standard mixtures and proteomes, reducing database search times without a concomitant reduction in identified peptides.
鸟枪法蛋白质组学实验需要收集数千个串联质谱;随着能够以更高扫描速率运行的新仪器问世,这些数据集还会继续增长。此类数据包含大量冗余信息,在单个液相色谱 - 串联质谱(LC-MS/MS)实验中,特定肽段的谱图会被多次采集。在本文中,我们介绍了MS2Grouper算法,该算法可检测谱图重复情况,评估相关谱图组,并将这些组替换为合成代表性谱图。使用准团标准校正检测谱图相似性时出现的错误——只有当谱图是至少三个完全相互关联的谱图组成的团的一部分,或者通过与团中除一个成员之外的所有成员相似而随后被添加到此类团中时,才将它们评估为一组。一种贪心算法通过迭代地从谱图集合中移除最高峰并与其他谱图中的峰进行匹配,为每个组构建一个代表性谱图。在来自蛋白质标准混合物和蛋白质组的LC-MS/MS数据集中,该策略被证明能有效减少高达20%的谱图数量,减少数据库搜索时间,同时不会减少已鉴定肽段的数量。