Suppr超能文献

使用机器学习预测电喷雾电离质谱中的肽离子化效率。

Predicting Peptide Ionization Efficiencies for Electrospray Ionization Mass Spectrometry Using Machine Learning.

机构信息

David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Analytical Sciences, BioPharmaceuticals R&D, AstraZeneca, Gaithersburg, Maryland 20878, United States.

出版信息

J Am Soc Mass Spectrom. 2024 Oct 2;35(10):2297-2307. doi: 10.1021/jasms.4c00137. Epub 2024 Sep 9.

Abstract

Mass spectrometry (MS) is inherently an information-rich technique. In this era of big data, label-free MS quantification for nontargeted studies has gained increasing popularity, especially for complex systems. One of the cornerstones of successful label-free quantification is the predictive modeling of ionization efficiency (IE) based on solutes' physicochemical properties. While many have studied IE modeling for small molecules, there are limited reports on peptide IEs. In this study, we leverage the stoichiometric relationship in trypsin digests of well-characterized monoclonal antibodies (mAbs) to compile a data set of relative ionization efficiencies (RIEs) for 241 peptides. From each peptide's sequence, we computed a set of physiochemical descriptors, which were then used to train machine learning regression models to predict RIEs. Peptides shorter than 20 amino acids had RIEs that were highly correlated to their molecular weight. A random forest (RF) model was able to best predict the RIEs of a test data set with a mean relative error of 23.9%. For larger peptides, a multilayer perceptron (MLP) model improved RIE prediction compared to current best practices, reducing mean relative error from 60.5% to 32.0%. Finally, we also show the application of the RF model in label-free relative protein quantification and improving the quantification of peptide post-translational modifications (PTMs). This approach to predicting peptide IEs from their sequences enables the development of accurate label-free quantification workflows for peptide and protein analysis.

摘要

质谱(MS)本质上是一种信息丰富的技术。在大数据时代,非靶向研究的无标记 MS 定量分析越来越受欢迎,尤其是对于复杂系统。成功进行无标记定量分析的基石之一是基于溶质物化特性预测离子化效率(IE)。虽然许多人已经研究了小分子的 IE 建模,但关于肽 IE 的报道有限。在这项研究中,我们利用经过充分表征的单克隆抗体(mAb)胰蛋白酶消化物中的化学计量关系,编制了 241 个肽的相对离子化效率(RIE)数据集。从每个肽的序列中,我们计算了一组物化描述符,然后将其用于训练机器学习回归模型来预测 RIE。短于 20 个氨基酸的肽的 RIE 与其分子量高度相关。随机森林(RF)模型能够以平均相对误差 23.9%最好地预测测试数据集的 RIE。对于较大的肽,与当前最佳实践相比,多层感知机(MLP)模型提高了 RIE 预测能力,将平均相对误差从 60.5%降低到 32.0%。最后,我们还展示了 RF 模型在无标记相对蛋白质定量中的应用,并改进了肽翻译后修饰(PTM)的定量。这种从序列预测肽 IE 的方法能够为肽和蛋白质分析开发准确的无标记定量工作流程。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验