通过结合搜索方法提高肽段鉴定的可信度。

Enhancing peptide identification confidence by combining search methods.

作者信息

Alves Gelio, Wu Wells W, Wang Guanghui, Shen Rong-Fong, Yu Yi-Kuo

机构信息

National Center for Biotechnology Information, Library of Medicine, NIH, Bethesda, MD 20894, USA.

出版信息

J Proteome Res. 2008 Aug;7(8):3102-13. doi: 10.1021/pr700798h. Epub 2008 Jun 18.

DOI:10.1021/pr700798h

PMID:18558733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2658881/

Abstract

Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.

摘要

可靠的肽段鉴定是基于质谱的蛋白质组学中最重要的组成部分之一。我们提出了一种方法，将不同数据库搜索方法的结果进行合理整合，以提高肽段鉴定的准确性。我们分析中纳入的数据库搜索方法包括SEQUEST（v27 rev12）、ProbID（v1.0）、InsPecT（v20060505）、Mascot（v2.1）、X! Tandem（v2007.07.01.2）、OMSSA（v2.0）和RAId_DbS。使用两个数据集，一个以profile模式收集，另一个以centroid模式收集，我们测试了两种搜索方法的所有21种组合以及三种搜索方法的所有35种可能组合的搜索性能。我们研究获得的结果表明，合理组合搜索方法确实能提高检索准确性。除了性能结果，我们还描述了理论框架，原则上该框架允许人们将包括从头测序和谱图库搜索在内的许多独立评分方法进行组合。还从共同真阳性、共同假阳性和全局分析的角度研究了不同方法之间的相关性。我们发现，在所研究的七种方法的任何成对组合之间，平均相关强度通常小于相关的标准误差。这表明不同方法之间可能仅存在弱相关性，并验证了我们组合搜索结果的方法。通过表明假阳性肽段的平均累积数量与组合E值相当吻合，进一步证实了我们方法的有效性。本研究相关数据可根据要求免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/73d939c654f2/pr-2007-00798h_0003.jpg

相似文献

Enhancing peptide identification confidence by combining search methods.

J Proteome Res. 2008 Aug;7(8):3102-13. doi: 10.1021/pr700798h. Epub 2008 Jun 18.

Calibrating E-values for MS2 database search methods.

Biol Direct. 2007 Nov 5;2:26. doi: 10.1186/1745-6150-2-26.

MassMatrix: a database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data.

Proteomics. 2009 Mar;9(6):1548-55. doi: 10.1002/pmic.200700322.

Combining De Novo Peptide Sequencing Algorithms, A Synergistic Approach to Boost Both Identifications and Confidence in Bottom-up Proteomics.

J Proteome Res. 2017 Sep 1;16(9):3209-3218. doi: 10.1021/acs.jproteome.7b00198. Epub 2017 Aug 22.

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry.

J Proteome Res. 2011 May 6;10(5):2154-60. doi: 10.1021/pr200031z. Epub 2011 Apr 5.

False Discovery Rate Estimation for Hybrid Mass Spectral Library Search Identifications in Bottom-up Proteomics.

J Proteome Res. 2019 Sep 6;18(9):3223-3234. doi: 10.1021/acs.jproteome.8b00863. Epub 2019 Aug 14.

Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.

J Proteome Res. 2008 Jan;7(1):286-92. doi: 10.1021/pr7006818. Epub 2007 Dec 14.

Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.

Mol Cell Proteomics. 2008 May;7(5):962-70. doi: 10.1074/mcp.M700293-MCP200. Epub 2008 Jan 23.

引用本文的文献

Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.

Methods Mol Biol. 2025;2859:109-128. doi: 10.1007/978-1-0716-4152-1_7.

Comparative Proteomic Profiling of Secreted Extracellular Vesicles from Breast Fibroadenoma and Malignant Lesions: A Pilot Study.

Int J Mol Sci. 2022 Apr 3;23(7):3989. doi: 10.3390/ijms23073989.

Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:1909-1914. doi: 10.1109/bibm.2017.8217951. Epub 2017 Dec 18.

Robust Accurate Identification and Biomass Estimates of Microorganisms via Tandem Mass Spectrometry.

J Am Soc Mass Spectrom. 2020 Jan 2;31(1):85-102. doi: 10.1021/jasms.9b00035. Epub 2019 Nov 20.

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.

Nat Commun. 2019 Jul 30;10(1):3404. doi: 10.1038/s41467-019-11337-z.

Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data.

J Proteome Res. 2018 Nov 2;17(11):3644-3656. doi: 10.1021/acs.jproteome.8b00206. Epub 2018 Oct 18.

Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

J Am Soc Mass Spectrom. 2018 Aug;29(8):1721-1737. doi: 10.1007/s13361-018-1986-y. Epub 2018 Jun 5.

A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

BMC Res Notes. 2018 Mar 15;11(1):182. doi: 10.1186/s13104-018-3289-6.

A large scale Plasmodium vivax- Saimiri boliviensis trophozoite-schizont transition proteome.

PLoS One. 2017 Aug 22;12(8):e0182561. doi: 10.1371/journal.pone.0182561. eCollection 2017.

A multi-model statistical approach for proteomic spectral count quantitation.

J Proteomics. 2016 Jul 20;144:23-32. doi: 10.1016/j.jprot.2016.05.032. Epub 2016 May 31.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Calibrating E-values for MS2 database search methods.

Biol Direct. 2007 Nov 5;2:26. doi: 10.1186/1745-6150-2-26.

RAId_DbS: peptide identification using database searches with realistic statistics.

Biol Direct. 2007 Oct 25;2:25. doi: 10.1186/1745-6150-2-25.

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

Nucleic Acids Res. 2006;34(20):5966-73. doi: 10.1093/nar/gkl731. Epub 2006 Oct 26.

An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis.

Proteomics. 2005 Aug;5(13):3475-90. doi: 10.1002/pmic.200500126.

InsPecT: identification of posttranslationally modified peptides from tandem mass spectra.

Anal Chem. 2005 Jul 15;77(14):4626-39. doi: 10.1021/ac050102d.

Open mass spectrometry search algorithm.

J Proteome Res. 2004 Sep-Oct;3(5):958-64. doi: 10.1021/pr0499491.

TANDEM: matching proteins with tandem mass spectra.

Bioinformatics. 2004 Jun 12;20(9):1466-7. doi: 10.1093/bioinformatics/bth092. Epub 2004 Feb 19.

ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.

Proteomics. 2002 Oct;2(10):1406-12. doi: 10.1002/1615-9861(200210)2:10<1406::AID-PROT1406>3.0.CO;2-9.

Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

Anal Chem. 2002 Oct 15;74(20):5383-92. doi: 10.1021/ac025747h.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过结合搜索方法提高肽段鉴定的可信度。

Enhancing peptide identification confidence by combining search methods.

作者信息

Alves Gelio, Wu Wells W, Wang Guanghui, Shen Rong-Fong, Yu Yi-Kuo

机构信息

National Center for Biotechnology Information, Library of Medicine, NIH, Bethesda, MD 20894, USA.

出版信息

J Proteome Res. 2008 Aug;7(8):3102-13. doi: 10.1021/pr700798h. Epub 2008 Jun 18.

DOI:10.1021/pr700798h

PMID:18558733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2658881/

Abstract

摘要

通过结合搜索方法提高肽段鉴定的可信度。

Enhancing peptide identification confidence by combining search methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过结合搜索方法提高肽段鉴定的可信度。

Enhancing peptide identification confidence by combining search methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献