更快的 SEQUEST 搜索从串联质谱中鉴定肽。

Faster SEQUEST searching for peptide identification from tandem mass spectra.

机构信息

Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States.

出版信息

J Proteome Res. 2011 Sep 2;10(9):3871-9. doi: 10.1021/pr101196n. Epub 2011 Jul 29.

DOI:10.1021/pr101196n

PMID:21761931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3166376/

Abstract

Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.

摘要

计算质谱分析仍然是许多蛋白质组学实验中的瓶颈。SEQUEST 是最早通过搜索已知肽数据库来识别肽的软件包之一。尽管它仍然很流行，但 SEQUEST 的速度较慢。Crux 和 TurboSEQUEST 通过在搜索中添加预计算索引成功地加速了 SEQUEST，但对更快的肽识别软件的需求仍在不断增长。这里介绍的 Tide 是一个实现肽识别 SEQUEST 算法的软件程序，与 Crux 和 SEQUEST 相比，它实现了显著的加速。这里详细介绍的优化策略采用了算法和软件工程技术的组合，实现了比使用索引的最新 SEQUEST 版本快 170 倍的速度。例如，在单个 Xeon CPU 上，Tide 以每秒 1550 个谱图的速度搜索针对包含 27499 个秀丽隐杆线虫蛋白的胰蛋白酶数据库的 10000 个谱图，这与在相同硬件上运行索引的最新 SEQUEST 版本的每秒 8.8 个谱图的速度相比具有优势。

相似文献

Faster SEQUEST searching for peptide identification from tandem mass spectra.更快的 SEQUEST 搜索从串联质谱中鉴定肽。

J Proteome Res. 2011 Sep 2;10(9):3871-9. doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.通过肽和谱索引加速串联质谱数据库搜索。

Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.ProLuCID：一种具有更高灵敏度和特异性的类似SEQUEST的改进算法。

J Proteomics. 2015 Nov 3;129:16-24. doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.优化用于SEQUEST数据库搜索的过滤标准以提高鸟枪法蛋白质组学中的蛋白质组覆盖率。

BMC Bioinformatics. 2007 Aug 31;8:323. doi: 10.1186/1471-2105-8-323.

Enhanced peptide quantification using spectral count clustering and cluster abundance.使用谱计数聚类和聚类丰度进行增强的肽定量。

BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.

Rapid and accurate peptide identification from tandem mass spectra.从串联质谱中快速准确地鉴定肽段。

J Proteome Res. 2008 Jul;7(7):3022-7. doi: 10.1021/pr800127y. Epub 2008 May 28.

Using SEQUEST with theoretically complete sequence databases.将SEQUEST与理论上完整的序列数据库配合使用。

J Am Soc Mass Spectrom. 2015 Nov;26(11):1858-64. doi: 10.1007/s13361-015-1228-5. Epub 2015 Aug 4.

Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.Crescendo：一种用于串联质谱的蛋白质序列数据库搜索引擎。

J Am Soc Mass Spectrom. 2015 Jul;26(7):1077-84. doi: 10.1007/s13361-015-1120-3. Epub 2015 Apr 21.

Learning score function parameters for improved spectrum identification in tandem mass spectrometry experiments.学习串联质谱实验中谱图识别的评分函数参数。

J Proteome Res. 2012 Sep 7;11(9):4499-508. doi: 10.1021/pr300234m. Epub 2012 Aug 15.

MacroSEQUEST: efficient candidate-centric searching and high-resolution correlation analysis for large-scale proteomics data sets.MacroSEQUEST：适用于大规模蛋白质组学数据集的高效以候选物为中心的搜索和高分辨率关联分析。

Anal Chem. 2010 Aug 15;82(16):6821-9. doi: 10.1021/ac100783x.

引用本文的文献

PKA-driven SPP1 activation as a novel mechanism connecting the bone microenvironment to prostate cancer progression.蛋白激酶A驱动的分泌性磷蛋白1激活作为一种将骨微环境与前列腺癌进展相联系的新机制。

Oncogene. 2025 Aug 2. doi: 10.1038/s41388-025-03511-z.

Immunomics-guided biomarker discovery for human liver fluke infection and infection-associated cholangiocarcinoma.免疫组学指导下的人类肝吸虫感染及感染相关胆管癌生物标志物发现

Nat Commun. 2025 Jul 1;16(1):5965. doi: 10.1038/s41467-025-61043-2.

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.使用截留法进行串联质谱分析时假发现率控制的评估

Nat Methods. 2025 Jun 16. doi: 10.1038/s41592-025-02719-x.

A transformer-based semi-autoregressive framework for high-speed and accurate de novo peptide sequencing.一种基于变压器的半自回归框架，用于高速且准确的从头肽测序。

Commun Biol. 2025 Feb 14;8(1):234. doi: 10.1038/s42003-025-07584-0.

PyViscount: Validating False Discovery Rate Estimation Methods via Random Search Space Partition.PyViscount：通过随机搜索空间划分验证错误发现率估计方法

J Proteome Res. 2025 Mar 7;24(3):1118-1134. doi: 10.1021/acs.jproteome.4c00743. Epub 2025 Feb 5.

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.用于训练和验证质谱蛋白质组学机器学习模型的多物种基准。

Sci Data. 2024 Nov 8;11(1):1207. doi: 10.1038/s41597-024-04068-4.

HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights.HMPA：从发现到功能见解的非典型肽组学的开创性框架。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae510.

Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification.使用基于深度注意力的多任务网络和不确定性量化从质谱数据预测肽的性质。

bioRxiv. 2024 Aug 22:2024.08.21.609035. doi: 10.1101/2024.08.21.609035.

Sequence-to-sequence translation from mass spectra to peptides with a transformer model.基于 Transformer 模型的从质谱到肽的序列到序列翻译。

Nat Commun. 2024 Jul 30;15(1):6427. doi: 10.1038/s41467-024-49731-x.

Making MS Omics Data ML-Ready: SpeCollate Protocols.使 MS 组学数据 ML 就绪：SpeCollate 方案。

Methods Mol Biol. 2024;2836:135-155. doi: 10.1007/978-1-0716-4007-4_9.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Fast parallel tandem mass spectral library searching using GPU hardware acceleration.利用 GPU 硬件加速进行快速并行串联质谱文库搜索。

J Proteome Res. 2011 Jun 3;10(6):2882-8. doi: 10.1021/pr200074h. Epub 2011 May 5.

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.通过肽和谱索引加速串联质谱数据库搜索。

Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.

A fast SEQUEST cross correlation algorithm.一种快速的SEQUEST互相关算法。

J Proteome Res. 2008 Oct;7(10):4598-602. doi: 10.1021/pr800420s. Epub 2008 Sep 6.

Rapid and accurate peptide identification from tandem mass spectra.从串联质谱中快速准确地鉴定肽段。

J Proteome Res. 2008 Jul;7(7):3022-7. doi: 10.1021/pr800127y. Epub 2008 May 28.

Protein identification using TurboSEQUEST.使用TurboSEQUEST进行蛋白质鉴定。

Curr Protoc Bioinformatics. 2005 Jul;Chapter 13:Unit 13.3. doi: 10.1002/0471250953.bi1303s10.

Semi-supervised learning for peptide identification from shotgun proteomics datasets.基于鸟枪法蛋白质组学数据集的肽段鉴定的半监督学习

Nat Methods. 2007 Nov;4(11):923-5. doi: 10.1038/nmeth1113. Epub 2007 Oct 21.

Analysis and validation of proteomic data generated by tandem mass spectrometry.串联质谱法产生的蛋白质组学数据的分析与验证

Nat Methods. 2007 Oct;4(10):787-97. doi: 10.1038/nmeth1088.

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.PepSplice：用于全面鉴定串联质谱的高效缓存搜索算法。

Bioinformatics. 2007 Nov 15;23(22):3016-23. doi: 10.1093/bioinformatics/btm417. Epub 2007 Sep 3.

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.利用EST和序列数据库压缩从串联质谱中鉴定新型肽段。

Mol Syst Biol. 2007;3:102. doi: 10.1038/msb4100142. Epub 2007 Apr 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验