PepSplice：用于全面鉴定串联质谱的高效缓存搜索算法。

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.

作者信息

Roos Franz F, Jacob Riko, Grossmann Jonas, Fischer Bernd, Buhmann Joachim M, Gruissem Wilhelm, Baginsky Sacha, Widmayer Peter

机构信息

Institute of Theoretical Computer Science, Institute of Plant Science, Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland.

出版信息

Bioinformatics. 2007 Nov 15;23(22):3016-23. doi: 10.1093/bioinformatics/btm417. Epub 2007 Sep 3.

DOI:10.1093/bioinformatics/btm417

PMID:17768164

Abstract

MOTIVATION

Tandem mass spectrometry allows for high-throughput identification of complex protein samples. Searching tandem mass spectra against sequence databases is the main analysis method nowadays. Since many peptide variations are possible, including them in the search space seems only logical. However, the search space usually grows exponentially with the number of independent variations and may therefore overwhelm computational resources.

RESULTS

We provide fast, cache-efficient search algorithms to screen large peptide search spaces including non-tryptic peptides, whole genomes, dozens of posttranslational modifications, unannotated point mutations and even unannotated splice sites. All these search spaces can be screened simultaneously. By optimizing the cache usage, we achieve a calculation speed that closely approaches the limits of the hardware. At the same time, we control the size of the overall search space by limiting the combinations of variations that can co-occur on the same peptide. Using a hypergeometric scoring scheme, we applied these algorithms to a dataset of 1 420 632 spectra. We were able to identify a considerable number of peptide variations within a modest amount of computing time on standard desktop computers.

摘要

动机

串联质谱法可实现对复杂蛋白质样品的高通量鉴定。如今，针对序列数据库搜索串联质谱图是主要的分析方法。由于可能存在多种肽段变异形式，将它们纳入搜索空间似乎是合理的。然而，搜索空间通常会随着独立变异数量呈指数增长，因此可能会耗尽计算资源。

结果

我们提供了快速、高效缓存的搜索算法，用于筛选大型肽段搜索空间，包括非胰蛋白酶肽段、全基因组、数十种翻译后修饰、未注释的点突变甚至未注释的剪接位点。所有这些搜索空间都可以同时进行筛选。通过优化缓存使用，我们实现了接近硬件极限的计算速度。同时，我们通过限制同一肽段上可能同时出现的变异组合来控制整体搜索空间的大小。使用超几何评分方案，我们将这些算法应用于一个包含1420632个质谱图的数据集。在标准台式计算机上，我们能够在适度的计算时间内识别出大量的肽段变异。

相似文献

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.

Bioinformatics. 2007 Nov 15;23(22):3016-23. doi: 10.1093/bioinformatics/btm417. Epub 2007 Sep 3.

A predictive model for identifying proteins by a single peptide match.

Bioinformatics. 2007 Feb 1;23(3):277-80. doi: 10.1093/bioinformatics/btl595. Epub 2006 Nov 22.

Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.

Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725.

MSDash: mass spectrometry database and search.

Comput Syst Bioinformatics Conf. 2008;7:63-71.

Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined.

Bioinformatics. 2005 May 15;21(10):2177-84. doi: 10.1093/bioinformatics/bti362. Epub 2005 Mar 3.

DBToolkit: processing protein databases for peptide-centric proteomics.

Bioinformatics. 2005 Sep 1;21(17):3584-5. doi: 10.1093/bioinformatics/bti588. Epub 2005 Jul 19.

pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry.

Bioinformatics. 2005 Jul 1;21(13):3049-50. doi: 10.1093/bioinformatics/bti439. Epub 2005 Apr 7.

pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.

Rapid Commun Mass Spectrom. 2007;21(18):2985-91. doi: 10.1002/rcm.3173.

mMass data miner: an open source alternative for mass spectrometric data analysis.

Rapid Commun Mass Spectrom. 2008;22(6):905-8. doi: 10.1002/rcm.3444.

Feature selection in validating mass spectrometry database search results.

J Bioinform Comput Biol. 2008 Feb;6(1):223-40. doi: 10.1142/s0219720008003345.

引用本文的文献

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

Annu Rev Anal Chem (Palo Alto Calif). 2016 Jun 12;9(1):521-45. doi: 10.1146/annurev-anchem-071015-041722. Epub 2016 Mar 30.

Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

Proteomics. 2015 Mar;15(5-6):930-49. doi: 10.1002/pmic.201400302.

Deep proteome profiling of Trichoplax adhaerens reveals remarkable features at the origin of metazoan multicellularity.

Nat Commun. 2013;4:1408. doi: 10.1038/ncomms2424.

Inference and validation of protein identifications.

Mol Cell Proteomics. 2012 Nov;11(11):1097-104. doi: 10.1074/mcp.R111.014795. Epub 2012 Aug 3.

pep2pro: the high-throughput proteomics data processing, analysis, and visualization tool.

Front Plant Sci. 2012 Jun 11;3:123. doi: 10.3389/fpls.2012.00123. eCollection 2012.

Overcoming species boundaries in peptide identification with Bayesian information criterion-driven error-tolerant peptide search (BICEPS).

Mol Cell Proteomics. 2012 Jul;11(7):M111.014167. doi: 10.1074/mcp.M111.014167. Epub 2012 Apr 6.

Faster SEQUEST searching for peptide identification from tandem mass spectra.

J Proteome Res. 2011 Sep 2;10(9):3871-9. doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Jasmonate controls polypeptide patterning in undamaged tissue in wounded Arabidopsis leaves.

Plant Physiol. 2011 Aug;156(4):1797-807. doi: 10.1104/pp.111.181008. Epub 2011 Jun 21.

Speeding up tandem mass spectrometry-based database searching by longest common prefix.

BMC Bioinformatics. 2010 Nov 25;11:577. doi: 10.1186/1471-2105-11-577.

A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PepSplice：用于全面鉴定串联质谱的高效缓存搜索算法。

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献