预测肽段碎片离子的强度等级。

Predicting intensity ranks of peptide fragment ions.

作者信息

Frank Ari M

机构信息

Department of Computer Science and Engineering, University of California, San Diego (UCSD), 9500 Gilman Drive, La Jolla, California 92093-0404, USA.

出版信息

J Proteome Res. 2009 May;8(5):2226-40. doi: 10.1021/pr800677f.

DOI:10.1021/pr800677f

PMID:19256476

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2738854/

Abstract

Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm into models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal multiple reaction monitoring (MRM) transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html.

摘要

对于肽段-质谱匹配的稳健评分函数的开发而言，肽段碎裂的精确建模是必要的，而肽段-质谱匹配是基于二级质谱的鉴定算法的基石。不幸的是，肽段碎裂是一个复杂的过程，可能涉及多种相互竞争的化学途径，这使得开发能够准确描述它的生成概率模型变得困难。然而，目前产生的大量二级质谱数据使得使用数据驱动的机器学习方法来开发基于判别式排序的模型成为可能，这些模型可以预测肽段碎片离子的强度排名。我们使用基于简单序列的特征，通过一种提升算法将这些特征组合成能够高精度预测峰排名的模型。在一篇配套论文中，我们展示了如何使用这些预测模型来显著提高肽段鉴定算法的性能。在没有足够的实验数据来指导峰选择过程的情况下，这些模型在优化多反应监测（MRM）跃迁的设计中也可能有用。预测算法也可以通过PepNovo+独立运行，PepNovo+可从http://bix.ucsd.edu/Software/PepNovo.html下载。

相似文献

Predicting intensity ranks of peptide fragment ions.预测肽段碎片离子的强度等级。

J Proteome Res. 2009 May;8(5):2226-40. doi: 10.1021/pr800677f.

A ranking-based scoring function for peptide-spectrum matches.一种用于肽段-质谱匹配的基于排序的评分函数。

J Proteome Res. 2009 May;8(5):2241-52. doi: 10.1021/pr800678b.

MS2PIP: a tool for MS/MS peak intensity prediction.MS2PIP：用于 MS/MS 峰强度预测的工具。

Bioinformatics. 2013 Dec 15;29(24):3199-203. doi: 10.1093/bioinformatics/btt544. Epub 2013 Sep 27.

Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification?评估蛋白质组学中的从头测序：是否已经成为数据库驱动肽鉴定的准确替代方法？

Brief Bioinform. 2018 Sep 28;19(5):954-970. doi: 10.1093/bib/bbx033.

PepNovo: de novo peptide sequencing via probabilistic network modeling.PepNovo：通过概率网络建模进行肽段从头测序。

Anal Chem. 2005 Feb 15;77(4):964-73. doi: 10.1021/ac048788h.

Improving Peptide-Spectrum Matching by Fragmentation Prediction Using Hidden Markov Models.利用隐马尔可夫模型进行碎片预测提高肽谱匹配。

J Proteome Res. 2019 Jun 7;18(6):2385-2396. doi: 10.1021/acs.jproteome.8b00499. Epub 2019 May 22.

Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry.利用核技巧通过串联质谱法关联碎片离子以进行肽段鉴定。

Bioinformatics. 2004 Aug 12;20(12):1948-54. doi: 10.1093/bioinformatics/bth186. Epub 2004 Mar 25.

Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques.更新的 MS²PIP 网络服务器为多种碎片化方法、仪器和标记技术提供快速、准确的 MS² 峰强度预测。

Nucleic Acids Res. 2019 Jul 2;47(W1):W295-W299. doi: 10.1093/nar/gkz299.

HI-bone: a scoring system for identifying phenylisothiocyanate-derivatized peptides based on precursor mass and high intensity fragment ions.HI-bone：一种基于母离子质量和高强度碎片离子鉴定苯异硫氰酸酯衍生肽的评分系统。

Anal Chem. 2013 Apr 2;85(7):3515-20. doi: 10.1021/ac303239g. Epub 2013 Mar 20.

A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data.一种利用串联质谱数据探索肽段光谱强度模式的机器学习方法。

BMC Bioinformatics. 2008 Jul 30;9:325. doi: 10.1186/1471-2105-9-325.

引用本文的文献

TopLib: Building and Searching Top-Down Mass Spectral Libraries for Proteoform Identification.TopLib：构建和搜索自上而下的质谱库以进行蛋白质异构体鉴定。

Anal Chem. 2025 Jun 10;97(22):11443-11453. doi: 10.1021/acs.analchem.4c06627. Epub 2025 May 29.

Expanding N-glycopeptide identifications by modeling fragmentation, elution, and glycome connectivity.通过模拟碎片化、洗脱和糖基连接来扩展 N-糖肽的鉴定。

Nat Commun. 2024 Jul 22;15(1):6168. doi: 10.1038/s41467-024-50338-5.

Statistical Framework for Identifying Differences in Similar Mass Spectra: Expanding Possibilities for Isomer Identification.用于识别相似质谱差异的统计框架：拓展同系物鉴定的可能性。

Anal Chem. 2023 May 2;95(17):6996-7005. doi: 10.1021/acs.analchem.3c00495. Epub 2023 Apr 17.

Accurate Prediction of y Ions in Beam-Type Collision-Induced Dissociation Using Deep Learning.利用深度学习准确预测束流型碰撞诱导解离中的 y 离子。

Anal Chem. 2022 Jun 7;94(22):7752-7758. doi: 10.1021/acs.analchem.1c03184. Epub 2022 May 24.

Peptidomics and Capillary Electrophoresis.肽组学与毛细管电泳。

Adv Exp Med Biol. 2021;1336:87-104. doi: 10.1007/978-3-030-77252-9_5.

Construction of à la carte QconCAT protein standards for multiplexed quantification of user-specified target proteins.定制 QconCAT 蛋白质标准品用于用户指定目标蛋白质的多重定量分析。

BMC Biol. 2021 Sep 8;19(1):195. doi: 10.1186/s12915-021-01135-9.

CIDer: A Statistical Framework for Interpreting Differences in CID and HCD Fragmentation.CIDer：一种解释 CID 和 HCD 碎片化差异的统计框架。

J Proteome Res. 2021 Apr 2;20(4):1951-1965. doi: 10.1021/acs.jproteome.0c00964. Epub 2021 Mar 17.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

Deep Learning in Proteomics.蛋白质组学中的深度学习。

Proteomics. 2020 Nov;20(21-22):e1900335. doi: 10.1002/pmic.201900335. Epub 2020 Oct 30.

Software-aided detection and structural characterization of cyclic peptide metabolites in biological matrix by high-resolution mass spectrometry.通过高分辨率质谱对生物基质中环状肽代谢物进行软件辅助检测和结构表征

J Pharm Anal. 2020 Jun;10(3):240-246. doi: 10.1016/j.jpha.2020.05.012. Epub 2020 May 26.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Optimization and testing of mass spectral library search algorithms for compound identification.化合物鉴定的质谱文库搜索算法的优化和测试。

J Am Soc Mass Spectrom. 1994 Sep;5(9):859-66. doi: 10.1016/1044-0305(94)87009-8.

A ranking-based scoring function for peptide-spectrum matches.一种用于肽段-质谱匹配的基于排序的评分函数。

J Proteome Res. 2009 May;8(5):2241-52. doi: 10.1021/pr800678b.

Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.使用动态贝叶斯网络对肽段进行建模以用于肽段鉴定

Bioinformatics. 2008 Jul 1;24(13):i348-56. doi: 10.1093/bioinformatics/btn189.

Accurate annotation of peptide modifications through unrestrictive database search.通过无限制数据库搜索对肽修饰进行准确注释。

J Proteome Res. 2008 Jan;7(1):170-81. doi: 10.1021/pr070444v. Epub 2007 Nov 23.

Peptide fragment intensity statistical modeling.

Anal Chem. 2007 Oct 1;79(19):7286-90. doi: 10.1021/ac070488n. Epub 2007 Aug 23.

Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation.翻译后修饰的全蛋白质组分析：质谱技术在蛋白质基因组注释中的应用

Genome Res. 2007 Sep;17(9):1362-77. doi: 10.1101/gr.6427907. Epub 2007 Aug 9.

Using statistical models to identify factors that have a role in defining the abundance of ions produced by tandem MS.使用统计模型来识别在定义串联质谱产生的离子丰度中起作用的因素。

Anal Chem. 2007 Aug 1;79(15):5601-7. doi: 10.1021/ac0700272. Epub 2007 Jun 20.

Relative specificities of water and ammonia losses from backbone fragments in collision-activated dissociation.碰撞激活解离中骨架片段水和氨损失的相对特异性

J Proteome Res. 2007 Jul;6(7):2669-73. doi: 10.1021/pr070121z. Epub 2007 May 16.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.用于提高质谱法大规模蛋白质鉴定可信度的靶标-诱饵搜索策略。

Nat Methods. 2007 Mar;4(3):207-14. doi: 10.1038/nmeth1019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。