CEMS, NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
Anal Chem. 2023 Apr 18;95(15):6235-6243. doi: 10.1021/acs.analchem.2c03662. Epub 2023 Mar 12.
In tandem mass spectrometry-based proteomics, proteins are digested into peptides by specific protease(s), but generally only a fraction of peptides can be detected. To characterize detectable proteotypic peptides, we have developed a series of methods to predict peptide digestibility and detectability. Here, we propose a bidirectional long short-term memory (BiLSTM)-based algorithm, named DeepDetect, for the prediction of peptide detectability enhanced by peptide digestibility. Compared with existing algorithms, DeepDetect is featured by its improved prediction accuracy for a wide range of commonly used proteases, covering trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN, and LysargiNase. On 11 test data sets from , yeast, mouse, and human samples, DeepDetect achieved higher prediction accuracies than PepFormer, a state-of-the-art deep-learning-based peptide detectability prediction algorithm. The results further demonstrated that peptide digestibility can substantially enhance the performance of peptide detectability predictors. As an application, DeepDetect was used to reduce the predicted spectral libraries in data-independent acquisition mass spectrometry data analysis. Experiments using DIA-NN software showed that DeepDetect can significantly accelerate the library search without loss of peptide and protein identification sensitivity.
在基于串联质谱的蛋白质组学中,蛋白质被特定的蛋白酶(s)消化成肽,但通常只有一部分肽可以被检测到。为了描述可检测的肽特征,我们开发了一系列预测肽消化率和可检测性的方法。在这里,我们提出了一种基于双向长短时记忆网络(BiLSTM)的算法,称为 DeepDetect,用于预测由肽消化率增强的肽可检测性。与现有算法相比,DeepDetect的特点是对广泛使用的蛋白酶(包括胰蛋白酶、ArgC、糜蛋白酶、GluC、LysC、AspN、LysN 和 LysargiNase)具有改进的预测精度。在来自酵母、小鼠和人类样本的 11 个测试数据集上,DeepDetect 的预测精度高于 PepFormer,这是一种基于深度学习的肽可检测性预测算法。结果进一步证明了肽消化率可以显著提高肽可检测性预测器的性能。作为一种应用,DeepDetect 用于减少数据非依赖性采集质谱数据分析中预测的光谱库。使用 DIA-NN 软件进行的实验表明,DeepDetect 可以在不损失肽和蛋白质鉴定灵敏度的情况下显著加速库搜索。