基于深度学习的漏切肽键的调查与高精度预测。

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

机构信息

Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany.

Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Munich, Germany.

出版信息

J Proteome Res. 2021 Jul 2;20(7):3749-3757. doi: 10.1021/acs.jproteome.1c00346. Epub 2021 Jun 17.

DOI:10.1021/acs.jproteome.1c00346

PMID:34137619

Abstract

Trypsin is one of the most important and widely used proteolytic enzymes in mass spectrometry (MS)-based proteomic research. It exclusively cleaves peptide bonds at the C-terminus of lysine and arginine. However, the cleavage is also affected by several factors, including specific surrounding amino acids, resulting in frequent incomplete proteolysis and subsequent issues in peptide identification and quantification. The accurate annotations on missed cleavages are crucial to database searching in MS analysis. Here, we present deep-learning predicting missed cleavages (dpMC), a novel algorithm for the prediction of missed trypsin cleavage sites. This algorithm provides a very high accuracy for predicting missed cleavages with area under the curves (AUCs) of cross-validation and holdout testing above 0.99, along with the mean F1 score and the Matthews correlation coefficient (MCC) of 0.9677 and 0.9349, respectively. We tested our algorithm on data sets from different species and different experimental conditions, and its performance outperforms other currently available prediction methods. In addition, the method also provides a better insight into the detailed rules of trypsin cleavages coupled with propensity and motif analysis. Moreover, our method can be integrated into database searching in the MS analysis to identify and quantify mass spectra effectively and efficiently.

摘要

胰蛋白酶是基于质谱（MS）的蛋白质组学研究中最重要和最广泛使用的蛋白水解酶之一。它专门在赖氨酸和精氨酸的 C 末端切割肽键。然而，这种切割也受到许多因素的影响，包括特定的周围氨基酸，导致频繁出现不完全的蛋白水解，从而影响肽的鉴定和定量。在 MS 分析的数据库搜索中，准确注释缺失的切割至关重要。在这里，我们提出了深度学习预测缺失切割（dpMC），这是一种用于预测胰蛋白酶缺失切割位点的新算法。该算法在交叉验证和保留测试中的曲线下面积（AUC）均高于 0.99，平均 F1 分数和马修斯相关系数（MCC）分别为 0.9677 和 0.9349，对预测缺失切割具有很高的准确性。我们在来自不同物种和不同实验条件的数据集中测试了我们的算法，其性能优于其他现有的预测方法。此外，该方法还结合倾向和模体分析，深入了解胰蛋白酶切割的详细规则。此外，我们的方法可以集成到 MS 分析中的数据库搜索中，以有效和高效地识别和定量质谱。

相似文献

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

J Proteome Res. 2021 Jul 2;20(7):3749-3757. doi: 10.1021/acs.jproteome.1c00346. Epub 2021 Jun 17.

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.

J Proteome Res. 2007 Jan;6(1):399-408. doi: 10.1021/pr060507u.

Mimicking LysC Proteolysis by 'Arginine Modification-cum-Trypsin Digestion': Comparison of Bottom-up & Middle-down Proteomic Approaches by ESI Q-TOF MS.

Protein Pept Lett. 2021;28(12):1379-1390. doi: 10.2174/0929866528666210929163307.

In Silico Peptide Repertoire of Human Olfactory Receptor Proteomes on High-Stringency Mass Spectrometry.

J Proteome Res. 2019 Dec 6;18(12):4117-4123. doi: 10.1021/acs.jproteome.8b00494. Epub 2019 May 22.

Tryptic Peptides Bearing C-Terminal Dimethyllysine Need to Be Considered during the Analysis of Lysine Dimethylation in Proteomic Study.

J Proteome Res. 2017 Sep 1;16(9):3460-3469. doi: 10.1021/acs.jproteome.7b00373. Epub 2017 Aug 8.

VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.

J Proteome Res. 2005 Nov-Dec;4(6):2338-47. doi: 10.1021/pr050264q.

Prediction of missed proteolytic cleavages for the selection of surrogate peptides for quantitative proteomics.

OMICS. 2012 Sep;16(9):449-56. doi: 10.1089/omi.2011.0156. Epub 2012 Jul 17.

Trypsin cleaves exclusively C-terminal to arginine and lysine residues.

Mol Cell Proteomics. 2004 Jun;3(6):608-14. doi: 10.1074/mcp.T400003-MCP200. Epub 2004 Mar 19.

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

Anal Chem. 2021 Apr 20;93(15):6094-6103. doi: 10.1021/acs.analchem.0c04704. Epub 2021 Apr 7.

Predicting tryptic cleavage from proteomics data using decision tree ensembles.

J Proteome Res. 2013 May 3;12(5):2253-9. doi: 10.1021/pr4001114. Epub 2013 Apr 4.

引用本文的文献

Role of artificial intelligence in revolutionizing drug discovery.

Fundam Res. 2024 May 9;5(3):1273-1287. doi: 10.1016/j.fmre.2024.04.021. eCollection 2025 May.

Comparison of N-Glycopeptide to Released N-Glycan Abundances and the Influence of Glycopeptide Mass and Charge States on N-Linked Glycosylation of IgG Antibodies.

J Proteome Res. 2024 Apr 5;23(4):1443-1457. doi: 10.1021/acs.jproteome.3c00904. Epub 2024 Mar 7.

Toward an Integrated Machine Learning Model of a Proteomics Experiment.

J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

DNA and protein analyses of hair in forensic genetics.

Int J Legal Med. 2023 May;137(3):613-633. doi: 10.1007/s00414-023-02955-w. Epub 2023 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于深度学习的漏切肽键的调查与高精度预测。

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

机构信息

Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Großhaderner Strasse 9, 82152 Planegg-Martinsried, Germany.

Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Munich, Germany.

出版信息

J Proteome Res. 2021 Jul 2;20(7):3749-3757. doi: 10.1021/acs.jproteome.1c00346. Epub 2021 Jun 17.

DOI:10.1021/acs.jproteome.1c00346

PMID:34137619

Abstract

摘要

基于深度学习的漏切肽键的调查与高精度预测。

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于深度学习的漏切肽键的调查与高精度预测。

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

机构信息

出版信息

相似文献

引用本文的文献