Suppr超能文献

深度学习衍生评估指标可有效用于磷酸肽鉴定计算工具的基准测试。

Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification.

机构信息

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA.

Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.

出版信息

Mol Cell Proteomics. 2021;20:100171. doi: 10.1016/j.mcpro.2021.100171. Epub 2021 Nov 1.

Abstract

Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson's correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.

摘要

串联质谱(MS/MS)为基础的磷酸化蛋白质组学是一种强大的技术,用于全球磷酸化分析。然而,将四个计算管道应用于来自人类癌症研究的典型基于 MS 的磷酸蛋白质组学数据集,我们观察到报道的磷酸肽鉴定和磷酸化位点定位结果之间存在很大差异,这突显了基准测试的重要性。虽然已经努力使用合成磷酸肽数据来比较计算管道的性能,但由于缺乏适当的评估指标,涉及实际应用数据的评估在很大程度上仅限于比较磷酸肽鉴定的数量。我们研究了三个深度学习衍生的特征作为潜在的评估指标:磷酸化位点概率、Delta RT 和谱相似性。磷酸化位点概率是由 MusiteDeep 计算的,如前所述,它提供了高精度;Delta RT 定义为由 AutoRT 观察到和预测的绝对保留时间(RT)之间的绝对 RT 差异;谱相似性定义为由 pDeep2 观察到和预测的谱之间的 Pearson 相关系数。使用合成肽数据集,我们发现,当错误的 PSM 涉及错误的肽序列时,Delta RT 和谱相似性都提供了出色的正确肽谱匹配(PSM)和错误 PSM 之间的区分,甚至当错误的 PSM 仅由错误的磷酸化位点定位引起时也是如此。基于这些结果,我们使用所有三个深度学习衍生的特征作为评估指标,比较了不同的磷酸蛋白质组数据集上的不同计算管道,并展示了它们在基准测试管道性能方面的效用。本研究中使用的基准指标将使用户能够为常规磷酸蛋白质组数据分析选择计算管道和参数,并为开发人员提供改进计算方法的指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d09/8609164/51b7fae47804/fx1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验