• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

裁缝:一种基于数据库搜索的 shotgun 蛋白质组学肽鉴定的非参数和快速评分校准方法。

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.

机构信息

Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science, National Research University Higher School of Economics (HSE), 11 Pokrovsky Boulevard, Moscow 109028, Russian Federation.

出版信息

J Proteome Res. 2020 Apr 3;19(4):1481-1490. doi: 10.1021/acs.jproteome.9b00736. Epub 2020 Mar 25.

DOI:10.1021/acs.jproteome.9b00736
PMID:32175744
Abstract

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, we introduce a novel, nonparametric, heuristic PSM score calibration method, called Tailor, which calibrates PSM scores by dividing them with the top 100-quantile of the empirical, spectrum-specific null distributions (i.e., the score with an associated -value of 0.01 at the tail, hence the name) observed during database searching. Tailor does not require any optimization steps or long calculations; it does not rely on any assumptions on the form of the score distribution (i.e., if it is, e.g., binomial); however, it relies on our empirical observation that the mean and the variance of the null distributions are correlated. In our benchmark, we re-calibrated the match scores of XCorr from Crux, HyperScore scores from X!Tandem, and the -values from OMSSA with the Tailor method and obtained more spectrum annotations than with raw scores at any false discovery rate level. Moreover, Tailor provided slightly more annotations than -values of X!Tandem and OMSSA and approached the performance of the computationally exhaustive exact -value method for XCorr on spectrum data sets containing low-resolution fragmentation information (MS2) around 20-150 times faster. On high-resolution MS2 data sets, the Tailor method with XCorr achieved state-of-the-art performance and produced more annotations than the well-calibrated residue-evidence (Res-ev) score around 50-80 times faster.

摘要

肽段谱匹配(PSM)分数在数据库搜索中经过校准,以与谱或谱肽特定的零分布相匹配。一些校准方法依赖于特定的假设,并使用分析模型(例如二项式分布),而其他方法则利用精确的经验零分布。前者可能由于不合理的假设而不准确,而后者则是准确的,尽管计算量很大。在这里,我们引入了一种新的、非参数的启发式 PSM 分数校准方法,称为 Tailor,它通过将 PSM 分数除以数据库搜索过程中观察到的经验、谱特异性零分布的前 100 分位数(即,与 0.01 的关联分数值在尾部,因此得名)来校准 PSM 分数。Tailor 不需要任何优化步骤或长时间的计算;它不依赖于分数分布形式的任何假设(即,如果它是例如二项式);然而,它依赖于我们的经验观察,即零分布的均值和方差是相关的。在我们的基准测试中,我们使用 Tailor 方法重新校准了来自 Crux 的 XCorr 的匹配分数、来自 X!Tandem 的 HyperScore 分数和来自 OMSSA 的 -值,并在任何错误发现率水平下获得了比原始分数更多的谱注释。此外,Tailor 提供的注释比 X!Tandem 和 OMSSA 的 -值略多,并且在包含低分辨率碎片化信息(MS2)的谱数据集上接近计算量极大的精确 -值方法的性能,速度快 20-150 倍左右。在高分辨率 MS2 数据集上,带有 XCorr 的 Tailor 方法实现了最先进的性能,并在大约 50-80 倍的速度内比经过良好校准的残基证据(Res-ev)分数产生了更多的注释。

相似文献

1
Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.裁缝:一种基于数据库搜索的 shotgun 蛋白质组学肽鉴定的非参数和快速评分校准方法。
J Proteome Res. 2020 Apr 3;19(4):1481-1490. doi: 10.1021/acs.jproteome.9b00736. Epub 2020 Mar 25.
2
Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data.结合高分辨率和精确校准以提高统计功效:用于高分辨率 MS2 数据的校准良好的评分函数。
J Proteome Res. 2018 Nov 2;17(11):3644-3656. doi: 10.1021/acs.jproteome.8b00206. Epub 2018 Oct 18.
3
Exact p-value calculation for XCorr scoring of high-resolution MS/MS data.精确计算 XCorr 评分的高分辨 MS/MS 数据的精确 p 值。
Proteomics. 2024 Mar;24(5):e2300145. doi: 10.1002/pmic.202300145. Epub 2023 Sep 19.
4
Statistical calibration of the SEQUEST XCorr function.SEQUEST XCorr函数的统计校准。
J Proteome Res. 2009 Apr;8(4):2106-13. doi: 10.1021/pr8011107.
5
On the importance of well-calibrated scores for identifying shotgun proteomics spectra.关于校准良好的分数在识别鸟枪法蛋白质组学谱图中的重要性。
J Proteome Res. 2015 Feb 6;14(2):1147-60. doi: 10.1021/pr5010983. Epub 2014 Dec 17.
6
Annotation of tandem mass spectrometry data using stochastic neural networks in shotgun proteomics.基于随机神经网络的 shotgun 蛋白质组学串联质谱数据注释。
Bioinformatics. 2020 Jun 1;36(12):3781-3787. doi: 10.1093/bioinformatics/btaa206.
7
Computing exact p-values for a cross-correlation shotgun proteomics score function.计算互相关鸟枪法蛋白质组学评分函数的精确p值。
Mol Cell Proteomics. 2014 Sep;13(9):2467-79. doi: 10.1074/mcp.O113.036327. Epub 2014 Jun 2.
8
Transfer posterior error probability estimation for peptide identification.肽鉴定中转后误差概率估计的转移。
BMC Bioinformatics. 2020 May 4;21(1):173. doi: 10.1186/s12859-020-3485-y.
9
Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity.用线性计算复杂度彻底鉴定交联肽。
J Proteome Res. 2017 Oct 6;16(10):3942-3952. doi: 10.1021/acs.jproteome.7b00338. Epub 2017 Sep 1.
10
Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics.确定鸟枪法蛋白质组学中独特肽段的置信度估计程序的校准
J Proteomics. 2013 Mar 27;80:123-31. doi: 10.1016/j.jprot.2012.12.007. Epub 2012 Dec 23.

引用本文的文献

1
Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.使用截留法进行串联质谱分析时假发现率控制的评估
Nat Methods. 2025 Jun 16. doi: 10.1038/s41592-025-02719-x.
2
A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.用于训练和验证质谱蛋白质组学机器学习模型的多物种基准。
Sci Data. 2024 Nov 8;11(1):1207. doi: 10.1038/s41597-024-04068-4.
3
Sequence-to-sequence translation from mass spectra to peptides with a transformer model.基于 Transformer 模型的从质谱到肽的序列到序列翻译。
Nat Commun. 2024 Jul 30;15(1):6427. doi: 10.1038/s41467-024-49731-x.
4
Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.使用截留法对串联质谱分析中的错误发现率控制进行评估。
bioRxiv. 2025 Jan 21:2024.06.01.596967. doi: 10.1101/2024.06.01.596967.
5
Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification.重新评分肽谱匹配:通过将肽性质预测器集成到肽鉴定中提高蛋白质组学性能。
Mol Cell Proteomics. 2024 Jul;23(7):100798. doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11.
6
FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics.FineFDR:宏蛋白质组学中细粒度分类学特异性错误发现率控制
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:287-292. doi: 10.1109/bibm55620.2022.9995401. Epub 2023 Jan 2.
7
The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data.用于从头串联质谱蛋白质组学数据分析的 Crux 工具包。
J Proteome Res. 2023 Feb 3;22(2):561-569. doi: 10.1021/acs.jproteome.2c00615. Epub 2023 Jan 4.
8
Improving Peptide-Level Mass Spectrometry Analysis via Double Competition.通过双重竞争提高肽段水平的质谱分析。
J Proteome Res. 2022 Oct 7;21(10):2412-2420. doi: 10.1021/acs.jproteome.2c00282. Epub 2022 Sep 27.
9
Building Spectral Libraries from Narrow-Window Data-Independent Acquisition Mass Spectrometry Data.从窄窗口数据非依赖性采集质谱数据构建光谱库。
J Proteome Res. 2022 Jun 3;21(6):1382-1391. doi: 10.1021/acs.jproteome.1c00895. Epub 2022 May 12.
10
TIDD: tool-independent and data-dependent machine learning for peptide identification.TIDD:用于肽鉴定的与工具无关且与数据相关的机器学习。
BMC Bioinformatics. 2022 Mar 30;23(1):109. doi: 10.1186/s12859-022-04640-y.