Tanner Stephen, Shu Hongjun, Frank Ari, Wang Ling-Chi, Zandi Ebrahim, Mumby Marc, Pevzner Pavel A, Bafna Vineet
Department of Bioengineering and Computer Science Department, APM 3832, University of California-San Diego, 9500 Gilman Drive, La Jolla, California 92093-0114, USA.
Anal Chem. 2005 Jul 15;77(14):4626-39. doi: 10.1021/ac050102d.
Reliable identification of posttranslational modifications is key to understanding various cellular regulatory processes. We describe a tool, InsPecT, to identify posttranslational modifications using tandem mass spectrometry data. InsPecT constructs database filters that proved to be very successful in genomics searches. Given an MS/MS spectrum S and a database D, a database filter selects a small fraction of database D that is guaranteed (with high probability) to contain a peptide that produced S. InsPecT uses peptide sequence tags as efficient filters that reduce the size of the database by a few orders of magnitude while retaining the correct peptide with very high probability. In addition to filtering, InsPecT also uses novel algorithms for scoring and validating in the presence of modifications, without explicit enumeration of all variants. InsPecT identifies modified peptides with better or equivalent accuracy than other database search tools while being 2 orders of magnitude faster than SEQUEST, and substantially faster than X!TANDEM on complex mixtures. The tool was used to identify a number of novel modifications in different data sets, including many phosphopeptides in data provided by Alliance for Cellular Signaling that were missed by other tools.
可靠地识别翻译后修饰是理解各种细胞调节过程的关键。我们描述了一种工具InsPecT,用于使用串联质谱数据识别翻译后修饰。InsPecT构建的数据库过滤器在基因组搜索中被证明非常成功。给定一个串联质谱图S和一个数据库D,数据库过滤器会从数据库D中选择一小部分,这部分(极有可能)保证包含产生S的肽段。InsPecT使用肽序列标签作为高效过滤器,可将数据库大小缩小几个数量级,同时极有可能保留正确的肽段。除了过滤,InsPecT还使用新颖的算法在存在修饰的情况下进行评分和验证,而无需明确列举所有变体。InsPecT识别修饰肽段的准确性优于或等同于其他数据库搜索工具,同时比SEQUEST快2个数量级,在复杂混合物上比X!TANDEM快得多。该工具用于识别不同数据集中的许多新型修饰,包括细胞信号联盟提供的数据中的许多磷酸化肽段,而其他工具则未能识别这些肽段。