深度消化：利用深度学习预测蛋白质的蛋白水解消化

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

作者信息

Yang Jinghan, Gao Zhiqiang, Ren Xiuhan, Sheng Jie, Xu Ping, Chang Cheng, Fu Yan

机构信息

CEMS, NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China.

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P. R. China.

出版信息

Anal Chem. 2021 Apr 20;93(15):6094-6103. doi: 10.1021/acs.analchem.0c04704. Epub 2021 Apr 7.

DOI:10.1021/acs.analchem.0c04704

PMID:33826301

Abstract

Proteolytic digestion of proteins by one or more proteases is a key step in shotgun proteomics, in which the proteolytic products, i.e., peptides, are taken as the surrogates of their parent proteins for further qualitative or quantitative analysis. The proteases generally cleave proteins at specific amino acid residue sites, but digestion is hardly complete (wide existence of missed cleavage sites). Therefore, it would be of great help to improve the prior experimental design and the posterior data analysis if the digestion behaviors of proteases can be accurately modeled and predicted. At present, systematic studies about the commonly used proteases in proteomics are insufficient, and there is a lack of easy-to-use tools to predict the cleavage sites of different proteases. Here, we propose a novel sequence-based deep learning algorithm-DeepDigest, which integrates convolutional neural networks and long short-term memory networks for protein digestion prediction. DeepDigest can predict the cleavage probability of each potential cleavage site on the protein sequences for eight popular proteases including trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN, and LysargiNase. We compared DeepDigest with three traditional machine learning algorithms, i.e., logistic regression, random forest, and support vector machine. On the eight training data sets, the 10-fold cross-validation accuracies (AUCs) of DeepDigest were 0.956-0.982, significantly higher than those of the three traditional algorithms. On the 11 independent test data sets, DeepDigest achieved AUCs between 0.849 and 0.978, outperforming the other traditional algorithms in most cases. Transfer learning then further improved the prediction accuracy. Besides, some interesting characteristics of different proteases were revealed and discussed. Ultimately, as an application, we used DeepDigest to predict the digestibilities of peptides and demonstrated that peptide digestibility is an informative new feature to discriminate between correct and incorrect peptide identifications.

摘要

通过一种或多种蛋白酶对蛋白质进行蛋白水解消化是鸟枪法蛋白质组学中的关键步骤，在该技术中，蛋白水解产物（即肽段）被用作其母体蛋白质的替代物，以进行进一步的定性或定量分析。蛋白酶通常在特定的氨基酸残基位点切割蛋白质，但消化几乎不可能完全完成（存在大量未切割位点）。因此，如果能够准确模拟和预测蛋白酶的消化行为，将对改进前期实验设计和后期数据分析有很大帮助。目前，关于蛋白质组学中常用蛋白酶的系统研究不足，且缺乏易于使用的工具来预测不同蛋白酶的切割位点。在此，我们提出了一种基于序列的新型深度学习算法——DeepDigest，它整合了卷积神经网络和长短期记忆网络用于蛋白质消化预测。DeepDigest可以预测包括胰蛋白酶、ArgC、胰凝乳蛋白酶、GluC、LysC、AspN、LysN和LysargiNase在内的八种常用蛋白酶在蛋白质序列上每个潜在切割位点的切割概率。我们将DeepDigest与三种传统机器学习算法（即逻辑回归、随机森林和支持向量机）进行了比较。在八个训练数据集上，DeepDigest的10倍交叉验证准确率（AUC）为0.956 - 0.982，显著高于这三种传统算法。在11个独立测试数据集上，DeepDigest的AUCUC在0.849至0.978之间，在大多数情况下优于其他传统算法。迁移学习进一步提高了预测准确率。此外，还揭示并讨论了不同蛋白酶的一些有趣特征。最后，作为一个应用，我们使用DeepDigest预测肽段的消化率，并证明肽段消化率是区分正确和错误肽段鉴定的一个有价值的新特征。

相似文献

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

Anal Chem. 2021 Apr 20;93(15):6094-6103. doi: 10.1021/acs.analchem.0c04704. Epub 2021 Apr 7.

DeepDetect: Deep Learning of Peptide Detectability Enhanced by Peptide Digestibility and Its Application to DIA Library Reduction.

Anal Chem. 2023 Apr 18;95(15):6235-6243. doi: 10.1021/acs.analchem.2c03662. Epub 2023 Mar 12.

Mimicking LysC Proteolysis by 'Arginine Modification-cum-Trypsin Digestion': Comparison of Bottom-up & Middle-down Proteomic Approaches by ESI Q-TOF MS.

Protein Pept Lett. 2021;28(12):1379-1390. doi: 10.2174/0929866528666210929163307.

Six alternative proteases for mass spectrometry-based proteomics beyond trypsin.

Nat Protoc. 2016 May;11(5):993-1006. doi: 10.1038/nprot.2016.057. Epub 2016 Apr 28.

Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning.

J Proteome Res. 2021 Jul 2;20(7):3749-3757. doi: 10.1021/acs.jproteome.1c00346. Epub 2021 Jun 17.

Proteomics beyond trypsin.

FEBS J. 2015 Jul;282(14):2612-26. doi: 10.1111/febs.13287. Epub 2015 Apr 14.

Value of using multiple proteases for large-scale mass spectrometry-based proteomics.

J Proteome Res. 2010 Mar 5;9(3):1323-9. doi: 10.1021/pr900863u.

Impact of Protease on Ultraviolet Photodissociation Mass Spectrometry for Bottom-up Proteomics.

J Proteome Res. 2015 Jun 5;14(6):2626-32. doi: 10.1021/acs.jproteome.5b00165. Epub 2015 May 18.

iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites.

Brief Bioinform. 2019 Mar 25;20(2):638-658. doi: 10.1093/bib/bby028.

Colon tumour secretopeptidome: insights into endogenous proteolytic cleavage events in the colon tumour microenvironment.

Biochim Biophys Acta. 2013 Nov;1834(11):2396-407. doi: 10.1016/j.bbapap.2013.05.006. Epub 2013 May 15.

引用本文的文献

Peptide abundance correlations in metaproteomics enhance taxonomic and functional analysis of the human gut microbiome.

NPJ Biofilms Microbiomes. 2025 Aug 19;11(1):166. doi: 10.1038/s41522-025-00801-y.

Role of artificial intelligence in revolutionizing drug discovery.

Fundam Res. 2024 May 9;5(3):1273-1287. doi: 10.1016/j.fmre.2024.04.021. eCollection 2025 May.

Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models.

Proteomics. 2025 May;25(9-10):e202400398. doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.

A rice SOUL family heme-binding protein REAC1 enhances the antioxidative capacity of C. elegans through modulation of ROS-related gene expression.

Sci Rep. 2025 Mar 26;15(1):10379. doi: 10.1038/s41598-025-95254-w.

Challenges and Insights in Absolute Quantification of Recombinant Therapeutic Antibodies by Mass Spectrometry: An Introductory Review.

Antibodies (Basel). 2025 Jan 7;14(1):3. doi: 10.3390/antib14010003.

Post-translational modifications of proteins in cardiovascular diseases examined by proteomic approaches.

FEBS J. 2025 Jan;292(1):28-46. doi: 10.1111/febs.17108. Epub 2024 Mar 5.

A Protease-Responsive Polymer/Peptide Conjugate and Reversible Assembly of Silver Clusters for the Detection of Enzymatic Activity.

ACS Nano. 2023 Sep 12;17(17):17308-17319. doi: 10.1021/acsnano.3c05268. Epub 2023 Aug 21.

Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning.

Cell Host Microbe. 2023 Aug 9;31(8):1260-1274.e6. doi: 10.1016/j.chom.2023.07.001. Epub 2023 Jul 28.

DbyDeep: Exploration of MS-Detectable Peptides via Deep Learning.

Anal Chem. 2023 Aug 1;95(30):11193-11200. doi: 10.1021/acs.analchem.3c00460. Epub 2023 Jul 17.

Toward an Integrated Machine Learning Model of a Proteomics Experiment.

J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深度消化：利用深度学习预测蛋白质的蛋白水解消化

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献