• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质组学 ML:一个在线平台,用于社区策划的数据集和蛋白质组学机器学习教程。

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.

机构信息

Institute for Mathematics and Computer Science, University of Southern Denmark, 5000 Odense, Denmark.

VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium.

出版信息

J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.

DOI:10.1021/acs.jproteome.2c00629
PMID:36693629
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9903315/
Abstract

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.

摘要

数据集的获取和管理通常是机器学习工作中最困难和最耗时的部分。对于基于蛋白质组学的液相色谱 (LC) 与质谱 (MS) 数据集来说尤其如此,这是因为在原始数据和机器学习准备好的数据之间会发生大量的数据缩减。由于预测蛋白质组学是一个新兴领域,在预测 LC-MS 设定中的肽行为时,每个实验室通常使用独特且复杂的数据处理管道,以最大限度地提高性能,而牺牲了可访问性和可重复性。出于这个原因,我们引入了 ProteomicsML,这是一个在线资源,提供了基于蛋白质组学的数据集和针对大多数当前探索的物理化学肽性质的教程。这个由社区驱动的资源使得以易于处理的格式访问数据变得简单,并包含易于遵循的教程,即使是该领域最先进的算法,新用户也可以与之交互。ProteomicsML 提供了数据集,这些数据集可用于比较最先进的机器学习算法,同时为教师和该领域的新手提供入门材料。该平台可在 https://www.proteomicsml.org/ 上免费获得,我们欢迎整个蛋白质组学社区在 https://github.com/ProteomicsML/ProteomicsML 上为该项目做出贡献。

相似文献

1
ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.蛋白质组学 ML:一个在线平台,用于社区策划的数据集和蛋白质组学机器学习教程。
J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.
2
Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit.慕尼黑啤酒节:基于 Prosit 的开源光谱库生成和重评分管道。
Proteomics. 2024 Apr;24(8):e2300112. doi: 10.1002/pmic.202300112. Epub 2023 Sep 6.
3
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.针对数据依赖型鸟枪法蛋白质组学的靶向特征检测。
J Proteome Res. 2017 Aug 4;16(8):2964-2974. doi: 10.1021/acs.jproteome.7b00248. Epub 2017 Jul 19.
4
Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data.Decon2LS:一个用于高分辨率质谱数据自动处理和可视化的开源软件包。
BMC Bioinformatics. 2009 Mar 17;10:87. doi: 10.1186/1471-2105-10-87.
5
MSAcquisitionSimulator: data-dependent acquisition simulator for LC-MS shotgun proteomics.MS采集模拟器:用于液相色谱-质谱鸟枪法蛋白质组学的数据依赖型采集模拟器。
Bioinformatics. 2016 Apr 15;32(8):1269-71. doi: 10.1093/bioinformatics/btv745. Epub 2015 Dec 17.
6
MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins.MS-EmpiRe 利用肽级别的噪声分布进行超灵敏差异表达蛋白检测。
Mol Cell Proteomics. 2019 Sep;18(9):1880-1892. doi: 10.1074/mcp.RA119.001509. Epub 2019 Jun 24.
7
MSRescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0.MSRescore 3.0 是一个模块化、灵活且用户友好的平台,可提高肽鉴定的质量,这一点在 MS Amanda 3.0 中得到了很好的展示。
J Proteome Res. 2024 Aug 2;23(8):3200-3207. doi: 10.1021/acs.jproteome.3c00785. Epub 2024 Mar 16.
8
TIMSRescore: A Data Dependent Acquisition-Parallel Accumulation and Serial Fragmentation-Optimized Data-Driven Rescoring Pipeline Based on MSRescore.TIMS重评分:一种基于MSRescore的数据依赖采集-并行累积与串行碎片化优化的数据驱动重评分流程。
J Proteome Res. 2025 Mar 7;24(3):1067-1076. doi: 10.1021/acs.jproteome.4c00609. Epub 2025 Feb 6.
9
APP: an Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools.APP:一种基于多个开放获取工具的用于质谱数据分析的自动化蛋白质组学流程。
BMC Bioinformatics. 2014 Dec 30;15(1):441. doi: 10.1186/s12859-014-0441-8.
10
Making MS Omics Data ML-Ready: SpeCollate Protocols.使 MS 组学数据 ML 就绪:SpeCollate 方案。
Methods Mol Biol. 2024;2836:135-155. doi: 10.1007/978-1-0716-4007-4_9.

引用本文的文献

1
Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.1997 - 2024年蛋白质组学中机器学习的进展与趋势:文献计量分析
Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.
2
Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models.使用人工智能进行质谱肽特性预测:最新模型介绍
Proteomics. 2025 May;25(9-10):e202400398. doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.
3
Recent Advances in Mass Spectrometry-Based Bottom-Up Proteomics.基于质谱的自下而上蛋白质组学的最新进展
Anal Chem. 2025 Mar 11;97(9):4728-4749. doi: 10.1021/acs.analchem.4c06750. Epub 2025 Feb 25.
4
The PRIDE database at 20 years: 2025 update.20年的PRIDE数据库:2025年更新
Nucleic Acids Res. 2025 Jan 6;53(D1):D543-D553. doi: 10.1093/nar/gkae1011.
5
Attracting Computational Researchers to Proteomics.吸引计算研究人员投身蛋白质组学
J Am Soc Mass Spectrom. 2024 Oct 2;35(10):2544-2546. doi: 10.1021/jasms.4c00185. Epub 2024 Aug 30.
6
Variability analysis of LC-MS experimental factors and their impact on machine learning.LC-MS 实验因素的可变性分析及其对机器学习的影响。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad096. Epub 2023 Nov 20.
7
The Role of Clinical Glyco(proteo)mics in Precision Medicine.临床糖组学(糖蛋白质组学)在精准医学中的作用。
Mol Cell Proteomics. 2023 Jun;22(6):100565. doi: 10.1016/j.mcpro.2023.100565. Epub 2023 May 9.
8
Toward an Integrated Machine Learning Model of a Proteomics Experiment.迈向蛋白质组学实验的集成机器学习模型。
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
9
Proteomic repository data submission, dissemination, and reuse: key messages.蛋白质组学知识库数据提交、发布和再利用:关键信息。
Expert Rev Proteomics. 2022 Jul-Dec;19(7-12):297-310. doi: 10.1080/14789450.2022.2160324. Epub 2022 Dec 26.

本文引用的文献

1
Toward an Integrated Machine Learning Model of a Proteomics Experiment.迈向蛋白质组学实验的集成机器学习模型。
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
2
Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning.用机器学习减少定量质谱数据中的肽序列偏差。
J Proteome Res. 2022 Jul 1;21(7):1771-1782. doi: 10.1021/acs.jproteome.2c00211. Epub 2022 Jun 13.
3
Deep learning neural network tools for proteomics.深度学习神经网络工具在蛋白质组学中的应用。
Cell Rep Methods. 2021 May 17;1(2):100003. doi: 10.1016/j.crmeth.2021.100003. eCollection 2021 Jun 21.
4
A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics.基于现代蛋白质组学采集策略的全面 LFQ 基准数据集。
Sci Data. 2022 Mar 30;9(1):126. doi: 10.1038/s41597-022-01216-6.
5
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.PRIDE 数据库资源在 2022 年:一个基于质谱的蛋白质组学证据的中心。
Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.
6
DeepLC can predict retention times for peptides that carry as-yet unseen modifications.DeepLC可以预测携带尚未见过的修饰的肽段的保留时间。
Nat Methods. 2021 Nov;18(11):1363-1369. doi: 10.1038/s41592-021-01301-5. Epub 2021 Oct 28.
7
Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project.鉴定和分析人类蛋白质组学的进展:2021 年 HUPO 人类蛋白质组计划的指标。
J Proteome Res. 2021 Dec 3;20(12):5227-5240. doi: 10.1021/acs.jproteome.1c00590. Epub 2021 Oct 20.
8
MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.MS2AI:用于机器学习应用的公共肽段液相色谱-质谱数据的自动重新利用。
Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.
9
The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource.拟南芥肽图集:利用全球蛋白质组学数据创建全面的社区蛋白质组学资源。
Plant Cell. 2021 Nov 4;33(11):3421-3453. doi: 10.1093/plcell/koab211.
10
ppx: Programmatic Access to Proteomics Data Repositories.ppx:蛋白质组学数据存储库的编程访问。
J Proteome Res. 2021 Sep 3;20(9):4621-4624. doi: 10.1021/acs.jproteome.1c00454. Epub 2021 Aug 3.