• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结合搜索方法提高肽段鉴定的可信度。

Enhancing peptide identification confidence by combining search methods.

作者信息

Alves Gelio, Wu Wells W, Wang Guanghui, Shen Rong-Fong, Yu Yi-Kuo

机构信息

National Center for Biotechnology Information, Library of Medicine, NIH, Bethesda, MD 20894, USA.

出版信息

J Proteome Res. 2008 Aug;7(8):3102-13. doi: 10.1021/pr700798h. Epub 2008 Jun 18.

DOI:10.1021/pr700798h
PMID:18558733
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2658881/
Abstract

Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.

摘要

可靠的肽段鉴定是基于质谱的蛋白质组学中最重要的组成部分之一。我们提出了一种方法,将不同数据库搜索方法的结果进行合理整合,以提高肽段鉴定的准确性。我们分析中纳入的数据库搜索方法包括SEQUEST(v27 rev12)、ProbID(v1.0)、InsPecT(v20060505)、Mascot(v2.1)、X! Tandem(v2007.07.01.2)、OMSSA(v2.0)和RAId_DbS。使用两个数据集,一个以profile模式收集,另一个以centroid模式收集,我们测试了两种搜索方法的所有21种组合以及三种搜索方法的所有35种可能组合的搜索性能。我们研究获得的结果表明,合理组合搜索方法确实能提高检索准确性。除了性能结果,我们还描述了理论框架,原则上该框架允许人们将包括从头测序和谱图库搜索在内的许多独立评分方法进行组合。还从共同真阳性、共同假阳性和全局分析的角度研究了不同方法之间的相关性。我们发现,在所研究的七种方法的任何成对组合之间,平均相关强度通常小于相关的标准误差。这表明不同方法之间可能仅存在弱相关性,并验证了我们组合搜索结果的方法。通过表明假阳性肽段的平均累积数量与组合E值相当吻合,进一步证实了我们方法的有效性。本研究相关数据可根据要求免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/6e0aef466f55/pr-2007-00798h_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/73d939c654f2/pr-2007-00798h_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/6e0aef466f55/pr-2007-00798h_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/73d939c654f2/pr-2007-00798h_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c4a/2658881/6e0aef466f55/pr-2007-00798h_0004.jpg

相似文献

1
Enhancing peptide identification confidence by combining search methods.通过结合搜索方法提高肽段鉴定的可信度。
J Proteome Res. 2008 Aug;7(8):3102-13. doi: 10.1021/pr700798h. Epub 2008 Jun 18.
2
Calibrating E-values for MS2 database search methods.校准用于MS2数据库搜索方法的E值
Biol Direct. 2007 Nov 5;2:26. doi: 10.1186/1745-6150-2-26.
3
MassMatrix: a database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data.质量矩阵:一种用于从串联质谱数据中快速鉴定蛋白质和肽段的数据库搜索程序。
Proteomics. 2009 Mar;9(6):1548-55. doi: 10.1002/pmic.200700322.
4
Combining De Novo Peptide Sequencing Algorithms, A Synergistic Approach to Boost Both Identifications and Confidence in Bottom-up Proteomics.结合从头肽序列算法,协同方法可提高下向蛋白质组学的鉴定数量和置信度。
J Proteome Res. 2017 Sep 1;16(9):3209-3218. doi: 10.1021/acs.jproteome.7b00198. Epub 2017 Aug 22.
5
Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.优化搜索引擎和后处理方法以最大化高分辨率质谱数据的肽段和蛋白质鉴定
J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.
6
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
7
MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry.MassWiz:一种基于靶标-诱饵的串联质谱分析策略的新型评分算法。
J Proteome Res. 2011 May 6;10(5):2154-60. doi: 10.1021/pr200031z. Epub 2011 Apr 5.
8
False Discovery Rate Estimation for Hybrid Mass Spectral Library Search Identifications in Bottom-up Proteomics.用于 Bottom-up 蛋白质组学中混合质谱文库搜索鉴定的假发现率估计。
J Proteome Res. 2019 Sep 6;18(9):3223-3234. doi: 10.1021/acs.jproteome.8b00863. Epub 2019 Aug 14.
9
Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.使用目标-诱饵数据库搜索策略和灵活混合模型对大规模蛋白质组学中的肽段鉴定进行统计验证。
J Proteome Res. 2008 Jan;7(1):286-92. doi: 10.1021/pr7006818. Epub 2007 Dec 14.
10
Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.用于低精度和高精度质谱分析的 Mascot 和 X!Tandem 性能比较以及 Mascot 调整阈值的开发
Mol Cell Proteomics. 2008 May;7(5):962-70. doi: 10.1074/mcp.M700293-MCP200. Epub 2008 Jan 23.

引用本文的文献

1
Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.利用 Galaxy 生物信息学平台中的蛋白质基因组工作流程发现新型蛋白异构体。
Methods Mol Biol. 2025;2859:109-128. doi: 10.1007/978-1-0716-4152-1_7.
2
Comparative Proteomic Profiling of Secreted Extracellular Vesicles from Breast Fibroadenoma and Malignant Lesions: A Pilot Study.乳腺纤维腺瘤和恶性病变分泌的细胞外囊泡的比较蛋白质组学分析:一项初步研究。
Int J Mol Sci. 2022 Apr 3;23(7):3989. doi: 10.3390/ijms23073989.
3
Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Calibrating E-values for MS2 database search methods.校准用于MS2数据库搜索方法的E值
Biol Direct. 2007 Nov 5;2:26. doi: 10.1186/1745-6150-2-26.
3
RAId_DbS: peptide identification using database searches with realistic statistics.RAId_DbS:使用具有实际统计数据的数据库搜索进行肽段鉴定。
基于深度学习的串联质谱(MSMS)谱图简化,以支持在云端运行多个蛋白质搜索引擎
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:1909-1914. doi: 10.1109/bibm.2017.8217951. Epub 2017 Dec 18.
4
Robust Accurate Identification and Biomass Estimates of Microorganisms via Tandem Mass Spectrometry.通过串联质谱法对微生物进行稳健准确的鉴定和生物量估计。
J Am Soc Mass Spectrom. 2020 Jan 2;31(1):85-102. doi: 10.1021/jasms.9b00035. Epub 2019 Nov 20.
5
A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.具有系统评估功能的高速搜索引擎 pLink 2,可用于蛋白质组规模的交联肽鉴定。
Nat Commun. 2019 Jul 30;10(1):3404. doi: 10.1038/s41467-019-11337-z.
6
Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data.结合高分辨率和精确校准以提高统计功效:用于高分辨率 MS2 数据的校准良好的评分函数。
J Proteome Res. 2018 Nov 2;17(11):3644-3656. doi: 10.1021/acs.jproteome.8b00206. Epub 2018 Oct 18.
7
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.基于高分辨串联质谱的高通量、高准确性微生物快速分类鉴定技术
J Am Soc Mass Spectrom. 2018 Aug;29(8):1721-1737. doi: 10.1007/s13361-018-1986-y. Epub 2018 Jun 5.
8
A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.用于RAId的图形用户界面,RAId是一个具有精确统计功能的知识集成蛋白质组学分析套件。
BMC Res Notes. 2018 Mar 15;11(1):182. doi: 10.1186/s13104-018-3289-6.
9
A large scale Plasmodium vivax- Saimiri boliviensis trophozoite-schizont transition proteome.大规模间日疟原虫-玻利维亚松鼠猴滋养体-裂殖体转化蛋白质组
PLoS One. 2017 Aug 22;12(8):e0182561. doi: 10.1371/journal.pone.0182561. eCollection 2017.
10
A multi-model statistical approach for proteomic spectral count quantitation.一种用于蛋白质组学光谱计数定量的多模型统计方法。
J Proteomics. 2016 Jul 20;144:23-32. doi: 10.1016/j.jprot.2016.05.032. Epub 2016 May 31.
Biol Direct. 2007 Oct 25;2:25. doi: 10.1186/1745-6150-2-25.
4
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.蛋白质序列数据库搜索中的检索准确性、统计显著性和组成相似性。
Nucleic Acids Res. 2006;34(20):5966-73. doi: 10.1093/nar/gkl731. Epub 2006 Oct 26.
5
An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis.几种公开可用的串联质谱(MS/MS)搜索算法的评估、比较及准确基准测试:灵敏度和特异性分析
Proteomics. 2005 Aug;5(13):3475-90. doi: 10.1002/pmic.200500126.
6
InsPecT: identification of posttranslationally modified peptides from tandem mass spectra.InsPecT:从串联质谱中鉴定翻译后修饰的肽段。
Anal Chem. 2005 Jul 15;77(14):4626-39. doi: 10.1021/ac050102d.
7
Open mass spectrometry search algorithm.开放式质谱搜索算法
J Proteome Res. 2004 Sep-Oct;3(5):958-64. doi: 10.1021/pr0499491.
8
TANDEM: matching proteins with tandem mass spectra.串联:将蛋白质与串联质谱进行匹配。
Bioinformatics. 2004 Jun 12;20(9):1466-7. doi: 10.1093/bioinformatics/bth092. Epub 2004 Feb 19.
9
ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.ProbID:一种通过串联质谱数据在序列数据库搜索中识别肽段的概率算法。
Proteomics. 2002 Oct;2(10):1406-12. doi: 10.1002/1615-9861(200210)2:10<1406::AID-PROT1406>3.0.CO;2-9.
10
Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.用于估计由串联质谱(MS/MS)和数据库搜索进行的肽段鉴定准确性的经验统计模型。
Anal Chem. 2002 Oct 15;74(20):5383-92. doi: 10.1021/ac025747h.