Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea.
Anal Chem. 2022 Jun 7;94(22):7752-7758. doi: 10.1021/acs.analchem.1c03184. Epub 2022 May 24.
Peptide fragmentation spectra contain critical information for the identification of peptides by mass spectrometry. In this study, we developed an algorithm that more accurately predicts the high-intensity peaks among the peptide spectra. The training data are composed of 180,833 peptides from the National Institute of Standards and Technology and Proteomics Identification database, which were fragmented by either quadrupole time-of-flight or triple-quadrupole collision-induced dissociation methods. Exploratory analysis of the peptide fragmentation pattern was focused on the highest intensity peaks that showed proline, peptide length, and a sliding window of four amino acid combination that can be exploited as key features. The amino acid sequence of each peptide and each of the key features were allocated to different layers of the model, where recurrent neural network, convolutional neural network, and fully connected neural network were used. The trained model, PrAI-frag, accurately predicts the fragmentation spectra compared to previous machine learning-based prediction algorithms. The model excels at high-intensity peak prediction, which is advantageous to selective/multiple reaction monitoring application. PrAI-frag is provided via a Web server which can be used for peptides of length 6-15.
肽段碎裂谱包含通过质谱鉴定肽的关键信息。在这项研究中,我们开发了一种算法,可以更准确地预测肽谱中的高强度峰。训练数据由国家标准与技术研究所和蛋白质组学鉴定数据库中的 180833 条肽组成,这些肽通过四极杆飞行时间或三重四极杆碰撞诱导解离方法进行碎裂。对肽碎裂模式的探索性分析集中在表现出脯氨酸、肽长度和四个氨基酸组合滑动窗口的最高强度峰上,这些峰可作为关键特征加以利用。每个肽的氨基酸序列和每个关键特征都被分配到模型的不同层中,其中使用了递归神经网络、卷积神经网络和全连接神经网络。与之前基于机器学习的预测算法相比,经过训练的模型 PrAI-frag 可以更准确地预测碎裂谱。该模型在高强度峰预测方面表现出色,这对选择性/多重反应监测应用非常有利。PrAI-frag 通过 Web 服务器提供,可用于长度为 6-15 的肽。