Bioinformatics Program, University of California, San Diego, La Jolla, CA 92093, USA.
Mol Cell Proteomics. 2011 Dec;10(12):M111.010017. doi: 10.1074/mcp.M111.010017. Epub 2011 Aug 23.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.
在高通量蛋白质组学中,计算方法和新的实验策略的发展往往相互依赖。在某些领域,用于数据采集的质谱方法领先于计算方法来解释所得串联质谱。特别是,尽管在许多情况下,混合物串联质谱可以包含来自两个或更多肽的片段离子,但几乎所有数据库搜索工具仍然假设每个串联质谱来自一个肽。常见的例子包括复杂样品中共洗脱肽的混合物谱、来自数据非依赖性采集方法生成的谱以及具有复杂翻译后修饰的肽的谱。我们提出了一种新的数据库搜索工具 (MixDB),它能够识别来自多个肽的混合物串联质谱。我们表明,在仅考虑所有可能肽对的 0.01%(速度提高四个数量级)的情况下,混合物谱可以可靠地识别高达 95%的准确性。与当前的数据库搜索方法进行比较表明,我们的方法在识别单肽谱时具有更好或相当的灵敏度和精度,同时能够以更高的精度从混合物谱中识别出 38%的肽。