Suppr超能文献

串联质谱的 CID、ETD 和 CID/ETD 对的生成函数:在数据库搜索中的应用。

The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.

出版信息

Mol Cell Proteomics. 2010 Dec;9(12):2840-52. doi: 10.1074/mcp.M110.003731. Epub 2010 Sep 9.

Abstract

Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches.

摘要

近年来,新的质谱技术(如电子转移解离,ETD)的出现以及可用于高通量实验的额外蛋白酶(如 Lys-N)的可用性的提高,给解释新型串联质谱(MS/MS)谱带来了新的挑战。传统的 MS/MS 数据库搜索算法,如 SEQUEST 和 Mascot,最初是为胰蛋白酶肽的碰撞诱导解离(CID)设计的,并且主要基于关于胰蛋白酶肽片段的专家知识(而不是机器学习技术)来设计 CID 特异性评分函数。因此,这些算法对于新的质谱技术或非胰蛋白酶肽的性能并不理想。我们最近提出了用于胰蛋白酶肽 CID 谱的生成函数方法(MS-GF)。在本研究中,我们将 MS-GF 扩展到从任何类型的一组注释 MS/MS 谱(例如 CID、ETD 等)自动推导评分参数,并基于 MS-GF 提出了一种新的数据库搜索工具 MS-GFDB。我们表明,MS-GFDB 在 ETD 谱或用 Lys-N 消化的肽的 Mascot 表现更好。例如,在 ETD 谱的情况下,MS-GFDB 鉴定的胰蛋白酶和 Lys-N 肽的数量分别比 Mascot 增加了 2.7 倍和 2.6 倍。此外,即使在经过十年的 Mascot 开发用于分析胰蛋白酶肽的 CID 谱之后,MS-GFDB(并非特别针对 CID 谱或胰蛋白酶肽进行定制)在肽鉴定数量上比 Mascot 增加了 28%。最后,我们提出了一种用于分析来自同一前体的多个谱(例如 CID/ETD 谱对)的统计框架,并为肽-谱-谱匹配分配 p 值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验