MyriMatch：通过多变量超几何分析实现高精度串联质谱肽段鉴定

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.

作者信息

Tabb David L, Fernando Christopher G, Chambers Matthew C

机构信息

Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA.

出版信息

J Proteome Res. 2007 Feb;6(2):654-61. doi: 10.1021/pr0604054.

DOI:10.1021/pr0604054

PMID:17269722

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2525619/

Abstract

Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.

摘要

鸟枪法蛋白质组学实验依赖于数据库搜索引擎从串联质谱中识别肽段。许多此类算法通过评估每个肽序列与观测光谱之间匹配的碎片离子数量来对潜在的识别结果进行评分。然而，这些系统通常无法区分匹配强峰和匹配弱峰。我们开发了一种基于多元超几何分布的统计模型来对肽段匹配进行评分。这个评分器是“MyriMatch”数据库搜索引擎的一部分，它更加强调匹配强峰。每个光谱的最佳匹配是随机出现的概率可用于区分正确匹配和随机匹配。我们在来自三个不同实验室、使用三种不同离子阱仪器的数据集上评估了该软件。通过采用一种新颖的测试区分度的系统，我们证明将峰分层到多个强度类别可提高评分的区分度。我们将MyriMatch的结果与Sequest和X!Tandem的结果进行比较，发现它比这两种算法中的任何一种都具有更高的区分度。当采用最小峰过滤时，对于不按强度对匹配峰进行分层的评分模型，性能会大幅下降。另一方面，我们发现随着每个光谱中保留的峰更多，MyriMatch的区分度会提高。MyriMatch对来自高分辨率质量分析器的串联质谱也具有良好的扩展性。这些发现可能表明现有数据库搜索评分器存在局限性，它们在计数匹配峰时未按强度进行区分。该软件和源代码可在以下网址根据Mozilla公共许可证获取：http://www.mc.vanderbilt.edu/msrc/bioinformatics/ 。

相似文献

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.

J Proteome Res. 2007 Feb;6(2):654-61. doi: 10.1021/pr0604054.

A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.

Anal Chem. 2003 Aug 1;75(15):3792-8. doi: 10.1021/ac034157w.

Probability-based validation of protein identifications using a modified SEQUEST algorithm.

Anal Chem. 2002 Nov 1;74(21):5593-9. doi: 10.1021/ac025826t.

Denoising peptide tandem mass spectra for spectral libraries: a Bayesian approach.

J Proteome Res. 2013 Jul 5;12(7):3223-32. doi: 10.1021/pr400080b. Epub 2013 Jun 6.

DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring.

J Proteome Res. 2008 Sep;7(9):3838-46. doi: 10.1021/pr800154p. Epub 2008 Jul 17.

Sipros Ensemble improves database searching and filtering for complex metaproteomics.

Bioinformatics. 2018 Mar 1;34(5):795-802. doi: 10.1093/bioinformatics/btx601.

[A novel approach for peptide identification by tandem mass spectrometry].

Sheng Wu Hua Xue Yu Sheng Wu Wu Li Xue Bao (Shanghai). 2003 Aug;35(8):734-40.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases.

Anal Chem. 2004 Mar 15;76(6):1664-71. doi: 10.1021/ac035112y.

Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification?

Brief Bioinform. 2018 Sep 28;19(5):954-970. doi: 10.1093/bib/bbx033.

引用本文的文献

Open-Source and FAIR Research Software for Proteomics.

J Proteome Res. 2025 May 2;24(5):2222-2234. doi: 10.1021/acs.jproteome.4c01079. Epub 2025 Apr 23.

Comprehensive analysis of the effects of and deletion on post-translational modifications of fibrillar collagens in mouse skin.

Front Cell Dev Biol. 2025 Feb 28;13:1527839. doi: 10.3389/fcell.2025.1527839. eCollection 2025.

π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing.

Nat Commun. 2025 Jan 2;16(1):267. doi: 10.1038/s41467-024-55021-3.

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

Smart Med. 2024 Aug 27;3(3):e20240014. doi: 10.1002/SMMD.20240014. eCollection 2024 Sep.

Functional surface expression of immunoglobulin cleavage systems in a candidate Mycoplasma vaccine chassis.

Commun Biol. 2024 Jun 28;7(1):779. doi: 10.1038/s42003-024-06497-8.

Bioinformatic Workflows for Metaproteomics.

Methods Mol Biol. 2024;2820:187-213. doi: 10.1007/978-1-0716-3910-8_16.

Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification.

Mol Cell Proteomics. 2024 Jul;23(7):100798. doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11.

A Comprehensive Understanding of Post-Translational Modification of Sox2 via Acetylation and -GlcNAcylation in Colorectal Cancer.

Cancers (Basel). 2024 Mar 3;16(5):1035. doi: 10.3390/cancers16051035.

Elucidation of Site-Specific Ubiquitination on Chaperones in Response to Mutant Huntingtin.

Cell Mol Neurobiol. 2023 Dec 15;44(1):3. doi: 10.1007/s10571-023-01446-1.

Accurate de novo peptide sequencing using fully convolutional neural networks.

Nat Commun. 2023 Dec 2;14(1):7974. doi: 10.1038/s41467-023-43010-x.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling.

Cell. 2006 Apr 7;125(1):173-86. doi: 10.1016/j.cell.2006.01.044.

Determination and comparison of the baseline proteomes of the versatile microbe Rhodopseudomonas palustris under its major metabolic states.

J Proteome Res. 2006 Feb;5(2):287-98. doi: 10.1021/pr0503230.

Randomized sequence databases for tandem mass spectrometry peptide and protein identification.

OMICS. 2005 Winter;9(4):364-79. doi: 10.1089/omi.2005.9.364.

MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence.

Anal Chem. 2005 Dec 1;77(23):7581-93. doi: 10.1021/ac0501745.

Further steps in standardisation. Report of the second annual Proteomics Standards Initiative Spring Workshop (Siena, Italy 17-20th April 2005).

Proteomics. 2005 Sep;5(14):3552-5. doi: 10.1002/pmic.200500626.

The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry.

J Bioinform Comput Biol. 2005 Apr;3(2):455-76. doi: 10.1142/s0219720005001120.

DBDigger: reorganized proteomic database identification that improves flexibility and speed.

Anal Chem. 2005 Apr 15;77(8):2464-74. doi: 10.1021/ac0487000.

Potential for false positive identifications from large databases through tandem mass spectrometry.

J Proteome Res. 2004 Sep-Oct;3(5):1082-5. doi: 10.1021/pr049946o.

Open mass spectrometry search algorithm.

J Proteome Res. 2004 Sep-Oct;3(5):958-64. doi: 10.1021/pr0499491.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MyriMatch：通过多变量超几何分析实现高精度串联质谱肽段鉴定

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献