Suppr超能文献

更快的 SEQUEST 搜索从串联质谱中鉴定肽。

Faster SEQUEST searching for peptide identification from tandem mass spectra.

机构信息

Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States.

出版信息

J Proteome Res. 2011 Sep 2;10(9):3871-9. doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Abstract

Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.

摘要

计算质谱分析仍然是许多蛋白质组学实验中的瓶颈。SEQUEST 是最早通过搜索已知肽数据库来识别肽的软件包之一。尽管它仍然很流行,但 SEQUEST 的速度较慢。Crux 和 TurboSEQUEST 通过在搜索中添加预计算索引成功地加速了 SEQUEST,但对更快的肽识别软件的需求仍在不断增长。这里介绍的 Tide 是一个实现肽识别 SEQUEST 算法的软件程序,与 Crux 和 SEQUEST 相比,它实现了显著的加速。这里详细介绍的优化策略采用了算法和软件工程技术的组合,实现了比使用索引的最新 SEQUEST 版本快 170 倍的速度。例如,在单个 Xeon CPU 上,Tide 以每秒 1550 个谱图的速度搜索针对包含 27499 个秀丽隐杆线虫蛋白的胰蛋白酶数据库的 10000 个谱图,这与在相同硬件上运行索引的最新 SEQUEST 版本的每秒 8.8 个谱图的速度相比具有优势。

相似文献

1
Faster SEQUEST searching for peptide identification from tandem mass spectra.
J Proteome Res. 2011 Sep 2;10(9):3871-9. doi: 10.1021/pr101196n. Epub 2011 Jul 29.
2
Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.
Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.
3
ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
J Proteomics. 2015 Nov 3;129:16-24. doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.
5
Enhanced peptide quantification using spectral count clustering and cluster abundance.
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.
6
Rapid and accurate peptide identification from tandem mass spectra.
J Proteome Res. 2008 Jul;7(7):3022-7. doi: 10.1021/pr800127y. Epub 2008 May 28.
7
Using SEQUEST with theoretically complete sequence databases.
J Am Soc Mass Spectrom. 2015 Nov;26(11):1858-64. doi: 10.1007/s13361-015-1228-5. Epub 2015 Aug 4.
8
Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.
J Am Soc Mass Spectrom. 2015 Jul;26(7):1077-84. doi: 10.1007/s13361-015-1120-3. Epub 2015 Apr 21.
9
Learning score function parameters for improved spectrum identification in tandem mass spectrometry experiments.
J Proteome Res. 2012 Sep 7;11(9):4499-508. doi: 10.1021/pr300234m. Epub 2012 Aug 15.

引用本文的文献

5
PyViscount: Validating False Discovery Rate Estimation Methods via Random Search Space Partition.
J Proteome Res. 2025 Mar 7;24(3):1118-1134. doi: 10.1021/acs.jproteome.4c00743. Epub 2025 Feb 5.
9
Sequence-to-sequence translation from mass spectra to peptides with a transformer model.
Nat Commun. 2024 Jul 30;15(1):6427. doi: 10.1038/s41467-024-49731-x.
10
Making MS Omics Data ML-Ready: SpeCollate Protocols.
Methods Mol Biol. 2024;2836:135-155. doi: 10.1007/978-1-0716-4007-4_9.

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Fast parallel tandem mass spectral library searching using GPU hardware acceleration.
J Proteome Res. 2011 Jun 3;10(6):2882-8. doi: 10.1021/pr200074h. Epub 2011 May 5.
3
Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.
Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.
4
A fast SEQUEST cross correlation algorithm.
J Proteome Res. 2008 Oct;7(10):4598-602. doi: 10.1021/pr800420s. Epub 2008 Sep 6.
5
Rapid and accurate peptide identification from tandem mass spectra.
J Proteome Res. 2008 Jul;7(7):3022-7. doi: 10.1021/pr800127y. Epub 2008 May 28.
6
Protein identification using TurboSEQUEST.
Curr Protoc Bioinformatics. 2005 Jul;Chapter 13:Unit 13.3. doi: 10.1002/0471250953.bi1303s10.
7
Semi-supervised learning for peptide identification from shotgun proteomics datasets.
Nat Methods. 2007 Nov;4(11):923-5. doi: 10.1038/nmeth1113. Epub 2007 Oct 21.
8
Analysis and validation of proteomic data generated by tandem mass spectrometry.
Nat Methods. 2007 Oct;4(10):787-97. doi: 10.1038/nmeth1088.
9
PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra.
Bioinformatics. 2007 Nov 15;23(22):3016-23. doi: 10.1093/bioinformatics/btm417. Epub 2007 Sep 3.
10
Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.
Mol Syst Biol. 2007;3:102. doi: 10.1038/msb4100142. Epub 2007 Apr 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验