Kim Sangtae, Gupta Nitin, Bandeira Nuno, Pevzner Pavel A
Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.
Mol Cell Proteomics. 2009 Jan;8(1):53-69. doi: 10.1074/mcp.M800103-MCP200. Epub 2008 Aug 14.
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.
数据库搜索工具通过将串联质谱与蛋白质数据库进行匹配来识别肽段。我们研究了一种替代方法,即生成一个质谱图的所有合理的从头解释(光谱字典),然后快速与数据库进行匹配。我们提出了一种新的MS-Dictionary算法,用于高效生成光谱字典,并证明MS-Dictionary可以识别数据库搜索中遗漏的质谱图。我们认为,MS-Dictionary能够在基因组序列的六框架翻译中进行蛋白质基因组学搜索,而这对于现有的数据库搜索方法来说可能耗时过长。我们表明,这种搜索可以校正测序错误并发现程序性移码。