Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany.
Mol Cell Proteomics. 2012 Nov;11(11):1500-9. doi: 10.1074/mcp.M112.020271. Epub 2012 Aug 10.
An important step in mass spectrometry (MS)-based proteomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impractical. In computer science, Expert Systems are a mature technology to implement a list of rules generated by interviews with practitioners. We here develop such an Expert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmentation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we establish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance surpasses a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-induced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides sequence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions.
基于质谱(MS)的蛋白质组学的一个重要步骤是通过其片段谱鉴定肽。无论达到的鉴定分数如何,几乎所有串联-MS(MS/MS)谱都包含未被搜索引擎分配的剩余峰。这些峰可能可以由人类专家解释,但现代蛋白质组学实验的规模使得这变得不切实际。在计算机科学中,专家系统是一种实现由从业者访谈生成的规则列表的成熟技术。我们在此开发了这样的专家系统,利用文献知识以及大量高质量和纯片段谱。有趣的是,我们发现,即使使用高质量精度数据,规则集也可能很快变得过于复杂,导致过度注释。因此,我们建立了一个严格的错误发现率,通过从大量其他 MS/MS 谱中随机插入峰来计算,并使用它来开发一个优化的知识库。这个规则集可以正确注释中等或高丰度的几乎所有峰。对于高分辨率 HCD 数据,仅通过搜索引擎注释,MS/MS 谱中片段峰的中值强度覆盖率从 58%增加到 86%。由此产生的注释性能超过了人类专家,尤其是对于较大磷酸化肽等复杂谱。我们的系统也适用于高分辨率碰撞诱导解离数据。它既可以作为 MaxQuant 的一部分,也可以通过一个仅需要 MS/MS 谱和相应肽序列的网络服务器使用,该服务器输出具有出版质量的注释 MS/MS 谱(www.biochem.mpg.de/mann/tools/)。它为基于 MS 的蛋白质组学领域的初学者提供了专家知识,并帮助高级用户专注于不寻常且可能是新型的片段离子。