Suppr超能文献

蛋白质组学 ML:一个在线平台,用于社区策划的数据集和蛋白质组学机器学习教程。

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.

机构信息

Institute for Mathematics and Computer Science, University of Southern Denmark, 5000 Odense, Denmark.

VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium.

出版信息

J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.

Abstract

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.

摘要

数据集的获取和管理通常是机器学习工作中最困难和最耗时的部分。对于基于蛋白质组学的液相色谱 (LC) 与质谱 (MS) 数据集来说尤其如此,这是因为在原始数据和机器学习准备好的数据之间会发生大量的数据缩减。由于预测蛋白质组学是一个新兴领域,在预测 LC-MS 设定中的肽行为时,每个实验室通常使用独特且复杂的数据处理管道,以最大限度地提高性能,而牺牲了可访问性和可重复性。出于这个原因,我们引入了 ProteomicsML,这是一个在线资源,提供了基于蛋白质组学的数据集和针对大多数当前探索的物理化学肽性质的教程。这个由社区驱动的资源使得以易于处理的格式访问数据变得简单,并包含易于遵循的教程,即使是该领域最先进的算法,新用户也可以与之交互。ProteomicsML 提供了数据集,这些数据集可用于比较最先进的机器学习算法,同时为教师和该领域的新手提供入门材料。该平台可在 https://www.proteomicsml.org/ 上免费获得,我们欢迎整个蛋白质组学社区在 https://github.com/ProteomicsML/ProteomicsML 上为该项目做出贡献。

相似文献

1
ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.
J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.
2
Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit.
Proteomics. 2024 Apr;24(8):e2300112. doi: 10.1002/pmic.202300112. Epub 2023 Sep 6.
3
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
J Proteome Res. 2017 Aug 4;16(8):2964-2974. doi: 10.1021/acs.jproteome.7b00248. Epub 2017 Jul 19.
5
MSAcquisitionSimulator: data-dependent acquisition simulator for LC-MS shotgun proteomics.
Bioinformatics. 2016 Apr 15;32(8):1269-71. doi: 10.1093/bioinformatics/btv745. Epub 2015 Dec 17.
6
MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins.
Mol Cell Proteomics. 2019 Sep;18(9):1880-1892. doi: 10.1074/mcp.RA119.001509. Epub 2019 Jun 24.
7
MSRescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0.
J Proteome Res. 2024 Aug 2;23(8):3200-3207. doi: 10.1021/acs.jproteome.3c00785. Epub 2024 Mar 16.
10
Making MS Omics Data ML-Ready: SpeCollate Protocols.
Methods Mol Biol. 2024;2836:135-155. doi: 10.1007/978-1-0716-4007-4_9.

引用本文的文献

1
Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.
Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.
2
Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models.
Proteomics. 2025 May;25(9-10):e202400398. doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.
3
Recent Advances in Mass Spectrometry-Based Bottom-Up Proteomics.
Anal Chem. 2025 Mar 11;97(9):4728-4749. doi: 10.1021/acs.analchem.4c06750. Epub 2025 Feb 25.
4
The PRIDE database at 20 years: 2025 update.
Nucleic Acids Res. 2025 Jan 6;53(D1):D543-D553. doi: 10.1093/nar/gkae1011.
5
Attracting Computational Researchers to Proteomics.
J Am Soc Mass Spectrom. 2024 Oct 2;35(10):2544-2546. doi: 10.1021/jasms.4c00185. Epub 2024 Aug 30.
6
Variability analysis of LC-MS experimental factors and their impact on machine learning.
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad096. Epub 2023 Nov 20.
7
The Role of Clinical Glyco(proteo)mics in Precision Medicine.
Mol Cell Proteomics. 2023 Jun;22(6):100565. doi: 10.1016/j.mcpro.2023.100565. Epub 2023 May 9.
8
Toward an Integrated Machine Learning Model of a Proteomics Experiment.
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
9
Proteomic repository data submission, dissemination, and reuse: key messages.
Expert Rev Proteomics. 2022 Jul-Dec;19(7-12):297-310. doi: 10.1080/14789450.2022.2160324. Epub 2022 Dec 26.

本文引用的文献

1
Toward an Integrated Machine Learning Model of a Proteomics Experiment.
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
2
Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning.
J Proteome Res. 2022 Jul 1;21(7):1771-1782. doi: 10.1021/acs.jproteome.2c00211. Epub 2022 Jun 13.
3
Deep learning neural network tools for proteomics.
Cell Rep Methods. 2021 May 17;1(2):100003. doi: 10.1016/j.crmeth.2021.100003. eCollection 2021 Jun 21.
4
A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics.
Sci Data. 2022 Mar 30;9(1):126. doi: 10.1038/s41597-022-01216-6.
5
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.
Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.
6
DeepLC can predict retention times for peptides that carry as-yet unseen modifications.
Nat Methods. 2021 Nov;18(11):1363-1369. doi: 10.1038/s41592-021-01301-5. Epub 2021 Oct 28.
7
Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project.
J Proteome Res. 2021 Dec 3;20(12):5227-5240. doi: 10.1021/acs.jproteome.1c00590. Epub 2021 Oct 20.
8
MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.
Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.
10
ppx: Programmatic Access to Proteomics Data Repositories.
J Proteome Res. 2021 Sep 3;20(9):4621-4624. doi: 10.1021/acs.jproteome.1c00454. Epub 2021 Aug 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验