深度学习衍生评估指标可有效用于磷酸肽鉴定计算工具的基准测试。

Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification.

机构信息

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA.

Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.

出版信息

Mol Cell Proteomics. 2021;20:100171. doi: 10.1016/j.mcpro.2021.100171. Epub 2021 Nov 1.

DOI:10.1016/j.mcpro.2021.100171

PMID:34737085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8609164/

Abstract

Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson's correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.

摘要

串联质谱（MS/MS）为基础的磷酸化蛋白质组学是一种强大的技术，用于全球磷酸化分析。然而，将四个计算管道应用于来自人类癌症研究的典型基于 MS 的磷酸蛋白质组学数据集，我们观察到报道的磷酸肽鉴定和磷酸化位点定位结果之间存在很大差异，这突显了基准测试的重要性。虽然已经努力使用合成磷酸肽数据来比较计算管道的性能，但由于缺乏适当的评估指标，涉及实际应用数据的评估在很大程度上仅限于比较磷酸肽鉴定的数量。我们研究了三个深度学习衍生的特征作为潜在的评估指标：磷酸化位点概率、Delta RT 和谱相似性。磷酸化位点概率是由 MusiteDeep 计算的，如前所述，它提供了高精度；Delta RT 定义为由 AutoRT 观察到和预测的绝对保留时间（RT）之间的绝对 RT 差异；谱相似性定义为由 pDeep2 观察到和预测的谱之间的 Pearson 相关系数。使用合成肽数据集，我们发现，当错误的 PSM 涉及错误的肽序列时，Delta RT 和谱相似性都提供了出色的正确肽谱匹配（PSM）和错误 PSM 之间的区分，甚至当错误的 PSM 仅由错误的磷酸化位点定位引起时也是如此。基于这些结果，我们使用所有三个深度学习衍生的特征作为评估指标，比较了不同的磷酸蛋白质组数据集上的不同计算管道，并展示了它们在基准测试管道性能方面的效用。本研究中使用的基准指标将使用户能够为常规磷酸蛋白质组数据分析选择计算管道和参数，并为开发人员提供改进计算方法的指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d09/8609164/51b7fae47804/fx1.jpg

相似文献

Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification.深度学习衍生评估指标可有效用于磷酸肽鉴定计算工具的基准测试。

Mol Cell Proteomics. 2021;20:100171. doi: 10.1016/j.mcpro.2021.100171. Epub 2021 Nov 1.

Deep Learning Prediction Boosts Phosphoproteomics-Based Discoveries Through Improved Phosphopeptide Identification.深度学习预测通过改进磷酸肽鉴定来增强基于磷酸蛋白质组学的发现。

Mol Cell Proteomics. 2024 Feb;23(2):100707. doi: 10.1016/j.mcpro.2023.100707. Epub 2023 Dec 26.

Deep learning prediction boosts phosphoproteomics-based discoveries through improved phosphopeptide identification.深度学习预测通过改进磷酸肽鉴定促进基于磷酸化蛋白质组学的发现。

bioRxiv. 2023 Jan 12:2023.01.11.523329. doi: 10.1101/2023.01.11.523329.

Capillary Zone Electrophoresis-Tandem Mass Spectrometry for Large-Scale Phosphoproteomics with the Production of over 11,000 Phosphopeptides from the Colon Carcinoma HCT116 Cell Line.毛细管区带电泳-串联质谱法用于大规模磷酸化蛋白质组学，从结肠癌细胞系 HCT116 中产生超过 11000 个磷酸肽。

Anal Chem. 2019 Feb 5;91(3):2201-2208. doi: 10.1021/acs.analchem.8b04770. Epub 2019 Jan 23.

Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation.结合碰撞诱导解离和电子转移解离的自信且灵敏的磷酸化蛋白质组学

J Proteomics. 2014 May 30;103(100):1-14. doi: 10.1016/j.jprot.2014.03.010. Epub 2014 Mar 21.

Reference-facilitated phosphoproteomics: fast and reliable phosphopeptide validation by microLC-ESI-Q-TOF MS/MS.参考辅助磷酸化蛋白质组学：通过微液相色谱-电喷雾电离-四极杆-飞行时间串联质谱进行快速可靠的磷酸肽验证

Mol Cell Proteomics. 2007 Aug;6(8):1380-91. doi: 10.1074/mcp.M600480-MCP200. Epub 2007 May 17.

DeepFLR facilitates false localization rate control in phosphoproteomics.DeepFLR 有助于磷蛋白质组学中假定位率的控制。

Nat Commun. 2023 Apr 20;14(1):2269. doi: 10.1038/s41467-023-38035-1.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

Evaluation of Parameters for Confident Phosphorylation Site Localization Using an Orbitrap Fusion Tribrid Mass Spectrometer.利用轨道阱融合三重四极杆质谱仪评估有信心的磷酸化位点定位参数。

J Proteome Res. 2017 Sep 1;16(9):3448-3459. doi: 10.1021/acs.jproteome.7b00337. Epub 2017 Aug 11.

Colander: a probability-based support vector machine algorithm for automatic screening for CID spectra of phosphopeptides prior to database search.滤器：一种基于概率的支持向量机算法，用于在数据库搜索之前自动筛选磷酸化肽段的CID光谱。

J Proteome Res. 2008 Aug;7(8):3628-34. doi: 10.1021/pr8001194. Epub 2008 Jun 19.

引用本文的文献

Mol Cell Proteomics. 2024 Feb;23(2):100707. doi: 10.1016/j.mcpro.2023.100707. Epub 2023 Dec 26.

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry.利用共分离质谱技术绘制生命之树中蛋白质状态和相互作用图谱。

Nat Commun. 2023 Dec 15;14(1):8365. doi: 10.1038/s41467-023-44139-5.

Considerations for defining +80 Da mass shifts in mass spectrometry-based proteomics: phosphorylation and beyond.基于质谱的蛋白质组学中定义 +80 Da 质量位移的考量：磷酸化及其他。

Chem Commun (Camb). 2023 Sep 26;59(77):11484-11499. doi: 10.1039/d3cc02909c.

Proteogenomic data and resources for pan-cancer analysis.泛癌分析的蛋白质基因组学数据和资源。

Cancer Cell. 2023 Aug 14;41(8):1397-1406. doi: 10.1016/j.ccell.2023.06.009.

DeepFLR facilitates false localization rate control in phosphoproteomics.DeepFLR 有助于磷蛋白质组学中假定位率的控制。

Nat Commun. 2023 Apr 20;14(1):2269. doi: 10.1038/s41467-023-38035-1.

A multi-purpose, regenerable, proteome-scale, human phosphoserine resource for phosphoproteomics.一种用于磷酸化蛋白质组学的多功能、可再生、蛋白质组规模的人类磷酸丝氨酸资源。

Nat Methods. 2022 Nov;19(11):1371-1375. doi: 10.1038/s41592-022-01638-5. Epub 2022 Oct 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度学习衍生评估指标可有效用于磷酸肽鉴定计算工具的基准测试。

Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献