Suppr超能文献

PepSplice:用于全面鉴定串联质谱的高效缓存搜索算法。

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.

作者信息

Roos Franz F, Jacob Riko, Grossmann Jonas, Fischer Bernd, Buhmann Joachim M, Gruissem Wilhelm, Baginsky Sacha, Widmayer Peter

机构信息

Institute of Theoretical Computer Science, Institute of Plant Science, Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland.

出版信息

Bioinformatics. 2007 Nov 15;23(22):3016-23. doi: 10.1093/bioinformatics/btm417. Epub 2007 Sep 3.

Abstract

MOTIVATION

Tandem mass spectrometry allows for high-throughput identification of complex protein samples. Searching tandem mass spectra against sequence databases is the main analysis method nowadays. Since many peptide variations are possible, including them in the search space seems only logical. However, the search space usually grows exponentially with the number of independent variations and may therefore overwhelm computational resources.

RESULTS

We provide fast, cache-efficient search algorithms to screen large peptide search spaces including non-tryptic peptides, whole genomes, dozens of posttranslational modifications, unannotated point mutations and even unannotated splice sites. All these search spaces can be screened simultaneously. By optimizing the cache usage, we achieve a calculation speed that closely approaches the limits of the hardware. At the same time, we control the size of the overall search space by limiting the combinations of variations that can co-occur on the same peptide. Using a hypergeometric scoring scheme, we applied these algorithms to a dataset of 1 420 632 spectra. We were able to identify a considerable number of peptide variations within a modest amount of computing time on standard desktop computers.

摘要

动机

串联质谱法可实现对复杂蛋白质样品的高通量鉴定。如今,针对序列数据库搜索串联质谱图是主要的分析方法。由于可能存在多种肽段变异形式,将它们纳入搜索空间似乎是合理的。然而,搜索空间通常会随着独立变异数量呈指数增长,因此可能会耗尽计算资源。

结果

我们提供了快速、高效缓存的搜索算法,用于筛选大型肽段搜索空间,包括非胰蛋白酶肽段、全基因组、数十种翻译后修饰、未注释的点突变甚至未注释的剪接位点。所有这些搜索空间都可以同时进行筛选。通过优化缓存使用,我们实现了接近硬件极限的计算速度。同时,我们通过限制同一肽段上可能同时出现的变异组合来控制整体搜索空间的大小。使用超几何评分方案,我们将这些算法应用于一个包含1420632个质谱图的数据集。在标准台式计算机上,我们能够在适度的计算时间内识别出大量的肽段变异。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验