Institute of Computer Science, Johannes Gutenberg University, 55128 Mainz, Germany.
Institute for Immunology, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad486.
Including ion mobility separation (IMS) into mass spectrometry proteomics experiments is useful to improve coverage and throughput. Many IMS devices enable linking experimentally derived mobility of an ion to its collisional cross-section (CCS), a highly reproducible physicochemical property dependent on the ion's mass, charge and conformation in the gas phase. Thus, known peptide ion mobilities can be used to tailor acquisition methods or to refine database search results. The large space of potential peptide sequences, driven also by posttranslational modifications of amino acids, motivates an in silico predictor for peptide CCS. Recent studies explored the general performance of varying machine-learning techniques, however, the workflow engineering part was of secondary importance. For the sake of applicability, such a tool should be generic, data driven, and offer the possibility to be easily adapted to individual workflows for experimental design and data processing.
We created ionmob, a Python-based framework for data preparation, training, and prediction of collisional cross-section values of peptides. It is easily customizable and includes a set of pretrained, ready-to-use models and preprocessing routines for training and inference. Using a set of ≈21 000 unique phosphorylated peptides and ≈17 000 MHC ligand sequences and charge state pairs, we expand upon the space of peptides that can be integrated into CCS prediction. Lastly, we investigate the applicability of in silico predicted CCS to increase confidence in identified peptides by applying methods of re-scoring and demonstrate that predicted CCS values complement existing predictors for that task.
The Python package is available at github: https://github.com/theGreatHerrLebert/ionmob.
在质谱蛋白质组学实验中纳入离子淌度分离(IMS)有助于提高覆盖率和通量。许多 IMS 设备能够将实验中获得的离子淌度与其碰撞截面(CCS)相关联,CCS 是一种高度可重现的物理化学性质,取决于离子在气相中的质量、电荷和构象。因此,可以使用已知的肽离子淌度来定制采集方法或改进数据库搜索结果。氨基酸的翻译后修饰也会推动潜在肽序列的大量产生,这就需要开发一个用于预测肽 CCS 的计算工具。最近的研究探索了不同机器学习技术的一般性能,然而,工作流程工程部分相对次要。为了适用性,这样的工具应该是通用的、基于数据的,并提供易于适应个体实验设计和数据处理工作流程的可能性。
我们创建了 ionmob,这是一个基于 Python 的框架,用于肽的 CCS 值的准备、训练和预测。它易于定制,并且包含一套预训练的、可立即使用的模型和预处理例程,用于训练和推理。使用一组约 21000 个独特的磷酸化肽和约 17000 个 MHC 配体序列和电荷状态对,我们扩展了可以集成到 CCS 预测中的肽的范围。最后,我们通过应用重新评分方法来研究计算预测的 CCS 值在提高鉴定肽的置信度方面的适用性,并证明预测的 CCS 值补充了现有预测器在该任务中的应用。
该 Python 包可在以下网址获得:https://github.com/theGreatHerrLebert/ionmob。