AlphaPeptDeep：用于蛋白质组学的模块化深度学习框架，用于预测肽性质。

AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics.

机构信息

Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.

Proteomics Program, NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.

出版信息

Nat Commun. 2022 Nov 24;13(1):7238. doi: 10.1038/s41467-022-34904-3.

DOI:10.1038/s41467-022-34904-3

PMID:36433986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9700817/

Abstract

Machine learning and in particular deep learning (DL) are increasingly important in mass spectrometry (MS)-based proteomics. Recent DL models can predict the retention time, ion mobility and fragment intensities of a peptide just from the amino acid sequence with good accuracy. However, DL is a very rapidly developing field with new neural network architectures frequently appearing, which are challenging to incorporate for proteomics researchers. Here we introduce AlphaPeptDeep, a modular Python framework built on the PyTorch DL library that learns and predicts the properties of peptides ( https://github.com/MannLabs/alphapeptdeep ). It features a model shop that enables non-specialists to create models in just a few lines of code. AlphaPeptDeep represents post-translational modifications in a generic manner, even if only the chemical composition is known. Extensive use of transfer learning obviates the need for large data sets to refine models for particular experimental conditions. The AlphaPeptDeep models for predicting retention time, collisional cross sections and fragment intensities are at least on par with existing tools. Additional sequence-based properties can also be predicted by AlphaPeptDeep, as demonstrated with a HLA peptide prediction model to improve HLA peptide identification for data-independent acquisition ( https://github.com/MannLabs/PeptDeep-HLA ).

摘要

机器学习，特别是深度学习（DL），在基于质谱（MS）的蛋白质组学中变得越来越重要。最近的 DL 模型可以仅根据氨基酸序列准确预测肽的保留时间、离子迁移率和片段强度。然而，DL 是一个发展非常迅速的领域，新的神经网络架构经常出现，这对蛋白质组学研究人员来说具有挑战性。在这里，我们介绍了 AlphaPeptDeep，这是一个基于 PyTorch DL 库的模块化 Python 框架，用于学习和预测肽的特性（https://github.com/MannLabs/alphapeptdeep）。它具有模型商店，即使只知道化学组成，非专业人员也可以在几行代码内创建模型。AlphaPeptDeep 以通用方式表示翻译后修饰，即使只知道化学组成。广泛使用迁移学习可以避免为特定实验条件细化模型所需的大数据集。用于预测保留时间、碰撞截面和片段强度的 AlphaPeptDeep 模型至少与现有工具相当。AlphaPeptDeep 还可以预测基于序列的其他特性，如通过 HLA 肽预测模型来改进用于数据非依赖性采集的 HLA 肽鉴定（https://github.com/MannLabs/PeptDeep-HLA）。