• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MS2AI:用于机器学习应用的公共肽段液相色谱-质谱数据的自动重新利用。

MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.

作者信息

Rehfeldt Tobias Greisager, Krawczyk Konrad, Bøgebjerg Mathias, Schwämmle Veit, Röttger Richard

机构信息

Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.

Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.

出版信息

Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.

DOI:10.1093/bioinformatics/btab701
PMID:34636883
Abstract

MOTIVATION

Liquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (i) absence of balanced training data with large sample size; (ii) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (iii) lack of benchmarking of ML methods on specific LC-MS problems.

RESULTS

We created the MS2AI pipeline that automates the process of gathering vast quantities of MS data for large-scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data are stored in a standardized format amenable for ML, encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides.

AVAILABILITY AND IMPLEMENTATION

An open-source implementation of the software can be found at https://gitlab.com/roettgerlab/ms2ai.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

液相色谱-质谱联用(LC-MS)是通过对数千种蛋白质进行鉴定和定量来分析生物样品中蛋白质组的既定标准。机器学习(ML)有望显著改善对所得数据的分析,然而,目前尚无任何工具能够介导从原始数据到现代ML应用的路径。更具体地说,ML应用目前受到三个主要限制:(i)缺乏大样本量的平衡训练数据;(ii)对于例如肽段鉴定等足够信息丰富的数据表示的定义不明确;(iii)缺乏针对特定LC-MS问题的ML方法的基准测试。

结果

我们创建了MS2AI管道,该管道可自动收集大量MS数据以用于大规模ML应用。该软件可从内部来源或蛋白质组学鉴定数据库PRIDE中检索原始数据。随后,原始数据以适合ML的标准化格式存储,包括MS1/MS2光谱和肽段鉴定。该工具弥合了MS与AI之间的差距,为此我们还展示了一种以卷积神经网络形式的ML应用,用于氧化肽段的鉴定。

可用性与实现

该软件的开源实现可在https://gitlab.com/roettgerlab/ms2ai上找到。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.MS2AI:用于机器学习应用的公共肽段液相色谱-质谱数据的自动重新利用。
Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.
2
DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs.DISMS2:一种用于液相色谱-串联质谱运行直接全蛋白质组距离计算的灵活算法。
BMC Bioinformatics. 2017 Mar 3;18(1):148. doi: 10.1186/s12859-017-1514-2.
3
Improved identification and quantification of peptides in mass spectrometry data via chemical and random additive noise elimination (CRANE).通过化学和随机加性噪声消除(CRANE)提高质谱数据中肽的鉴定和定量。
Bioinformatics. 2021 Dec 11;37(24):4719-4726. doi: 10.1093/bioinformatics/btab563.
4
High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples.高通量数据库搜索和大规模负相液相色谱-串联质谱联用与紫外光解用于复杂蛋白质组样品。
Mol Cell Proteomics. 2013 Sep;12(9):2604-14. doi: 10.1074/mcp.O113.028258. Epub 2013 May 21.
5
MZDASoft: a software architecture that enables large-scale comparison of protein expression levels over multiple samples based on liquid chromatography/tandem mass spectrometry.MZDASoft:一种软件架构,可基于液相色谱/串联质谱对多个样本的蛋白质表达水平进行大规模比较。
Rapid Commun Mass Spectrom. 2015 Oct 15;29(19):1841-8. doi: 10.1002/rcm.7272.
6
Automated diagnosis of LC-MS/MS performance.液相色谱-串联质谱性能的自动化诊断
Bioinformatics. 2009 May 15;25(10):1341-3. doi: 10.1093/bioinformatics/btp155. Epub 2009 Mar 20.
7
SimExTargId: a comprehensive package for real-time LC-MS data acquisition and analysis.SimExTargId:一个用于实时 LC-MS 数据采集和分析的综合软件包。
Bioinformatics. 2018 Oct 15;34(20):3589-3590. doi: 10.1093/bioinformatics/bty218.
8
Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification.小分子鉴定中质谱和保留时间信息集成的概率框架。
Bioinformatics. 2021 Jul 19;37(12):1724-1731. doi: 10.1093/bioinformatics/btaa998.
9
A peptide-retrieval strategy enables significant improvement of quantitative performance without compromising confidence of identification.肽段检索策略可在不影响鉴定置信度的情况下显著提高定量性能。
J Proteomics. 2017 Jan 30;152:276-282. doi: 10.1016/j.jprot.2016.11.020. Epub 2016 Nov 27.
10
The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results.APEX定量蛋白质组学工具:从液相色谱-串联质谱蛋白质组学结果生成蛋白质定量估计值。
BMC Bioinformatics. 2008 Dec 9;9:529. doi: 10.1186/1471-2105-9-529.

引用本文的文献

1
Variability analysis of LC-MS experimental factors and their impact on machine learning.LC-MS 实验因素的可变性分析及其对机器学习的影响。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad096. Epub 2023 Nov 20.
2
Insight on physicochemical properties governing peptide MS1 response in HPLC-ESI-MS/MS: A deep learning approach.关于HPLC-ESI-MS/MS中控制肽段MS1响应的物理化学性质的见解:一种深度学习方法。
Comput Struct Biotechnol J. 2023 Jul 22;21:3715-3727. doi: 10.1016/j.csbj.2023.07.027. eCollection 2023.
3
Toward an Integrated Machine Learning Model of a Proteomics Experiment.
迈向蛋白质组学实验的集成机器学习模型。
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
4
ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.蛋白质组学 ML:一个在线平台,用于社区策划的数据集和蛋白质组学机器学习教程。
J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.
5
MZA: A Data Conversion Tool to Facilitate Software Development and Artificial Intelligence Research in Multidimensional Mass Spectrometry.MZA:一个数据转换工具,用于促进多维质谱中的软件开发和人工智能研究。
J Proteome Res. 2023 Feb 3;22(2):508-513. doi: 10.1021/acs.jproteome.2c00313. Epub 2022 Nov 22.
6
Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics.对 DOME 推荐用于蛋白质组学和代谢组学中的机器学习的解读。
J Proteome Res. 2022 Apr 1;21(4):1204-1207. doi: 10.1021/acs.jproteome.1c00900. Epub 2022 Feb 4.