从蛋白质组学和组学数据中发现生物标志物的机器学习透明探索。

Transparent Exploration of Machine Learning for Biomarker Discovery from Proteomics and Omics Data.

机构信息

OmicEra Diagnostics GmbH, 82152 Planegg, Germany.

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark.

出版信息

J Proteome Res. 2023 Feb 3;22(2):359-367. doi: 10.1021/acs.jproteome.2c00473. Epub 2022 Nov 25.

DOI:10.1021/acs.jproteome.2c00473

PMID:36426751

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9903317/

Abstract

Biomarkers are of central importance for assessing the health state and to guide medical interventions and their efficacy; still, they are lacking for most diseases. Mass spectrometry (MS)-based proteomics is a powerful technology for biomarker discovery but requires sophisticated bioinformatics to identify robust patterns. Machine learning (ML) has become a promising tool for this purpose. However, it is sometimes applied in an opaque manner and generally requires specialized knowledge. To enable easy access to ML for biomarker discovery without any programming or bioinformatics skills, we developed "OmicLearn" (http://OmicLearn.org), an open-source browser-based ML tool using the latest advances in the Python ML ecosystem. Data matrices from omics experiments are easily uploaded to an online or a locally installed web server. OmicLearn enables rapid exploration of the suitability of various ML algorithms for the experimental data sets. It fosters open science via transparent assessment of state-of-the-art algorithms in a standardized format for proteomics and other omics sciences.

摘要

生物标志物对于评估健康状态和指导医学干预及其疗效至关重要；然而，它们在大多数疾病中都缺乏。基于质谱（MS）的蛋白质组学是一种强大的生物标志物发现技术，但需要复杂的生物信息学来识别稳健的模式。机器学习（ML）已成为一种有前途的工具。然而，它有时以不透明的方式应用，并且通常需要专门的知识。为了能够在无需任何编程或生物信息学技能的情况下轻松访问用于生物标志物发现的 ML，我们开发了“OmicLearn”（http://OmicLearn.org），这是一个基于浏览器的开源 ML 工具，使用了 Python ML 生态系统中的最新进展。来自组学实验的数据矩阵可以轻松地上传到在线或本地安装的 Web 服务器。OmicLearn 能够快速探索各种 ML 算法对实验数据集的适用性。它通过以标准化格式对蛋白质组学和其他组学科学的最先进算法进行透明评估，促进了开放科学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9f8/9903317/a570279673e1/pr2c00473_0001.jpg

相似文献

Transparent Exploration of Machine Learning for Biomarker Discovery from Proteomics and Omics Data.从蛋白质组学和组学数据中发现生物标志物的机器学习透明探索。

J Proteome Res. 2023 Feb 3;22(2):359-367. doi: 10.1021/acs.jproteome.2c00473. Epub 2022 Nov 25.

Artificial intelligence for proteomics and biomarker discovery.用于蛋白质组学和生物标志物发现的人工智能

Cell Syst. 2021 Aug 18;12(8):759-770. doi: 10.1016/j.cels.2021.06.006.

[Research progress of feature selection and machine learning methods for mass spectrometry-based protein biomarker discovery].基于质谱的蛋白质生物标志物发现的特征选择与机器学习方法研究进展

Sheng Wu Gong Cheng Xue Bao. 2019 Sep 25;35(9):1619-1632. doi: 10.13345/j.cjb.190064.

Biomarker discovery studies for patient stratification using machine learning analysis of omics data: a scoping review.基于组学数据的机器学习分析进行患者分层的生物标志物发现研究：范围综述。

BMJ Open. 2021 Dec 6;11(12):e053674. doi: 10.1136/bmjopen-2021-053674.

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.蛋白质组学 ML：一个在线平台，用于社区策划的数据集和蛋白质组学机器学习教程。

J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.

Designing an In Silico Strategy to Select Tissue-Leakage Biomarkers Using the Galaxy Framework.利用Galaxy框架设计一种计算机模拟策略以选择组织渗漏生物标志物。

Methods Mol Biol. 2019;1959:275-289. doi: 10.1007/978-1-4939-9164-8_18.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.科拉：用于液相色谱-质谱联用发现和基于靶向质谱的蛋白质组学的计算框架及工具。

BMC Bioinformatics. 2008 Dec 16;9:542. doi: 10.1186/1471-2105-9-542.

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology.机器学习在蛋白质组学数据中的应用：后基因组生物学中的分类和生物标志物识别。

OMICS. 2013 Dec;17(12):595-610. doi: 10.1089/omi.2013.0017. Epub 2013 Oct 12.

Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析：综述

Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.

引用本文的文献

AI-Driven Advances in Parkinson's Disease Neurosurgery: Enhancing Patient Selection, Trial Efficiency, and Therapeutic Outcomes.人工智能驱动的帕金森病神经外科进展：优化患者选择、试验效率及治疗效果

Brain Sci. 2025 May 9;15(5):494. doi: 10.3390/brainsci15050494.

Intimate partner violence and stress-related disorders: from epigenomics to resilience.亲密伴侣暴力与应激相关障碍：从表观基因组学到复原力

Front Glob Womens Health. 2025 May 12;6:1536169. doi: 10.3389/fgwh.2025.1536169. eCollection 2025.

Cultivar Differentiation and Origin Tracing of Using Machine Learning Model-DrivenComparative Metabolomics.基于机器学习模型驱动的比较代谢组学进行品种鉴别与溯源

Foods. 2025 Apr 14;14(8):1340. doi: 10.3390/foods14081340.

A Proteogenomic View of Synchronous Endometrioid Endometrial and Ovarian Cancer.同步性子宫内膜样子宫内膜癌和卵巢癌的蛋白质基因组学视角

Clin Cancer Res. 2025 Jun 3;31(11):2230-2240. doi: 10.1158/1078-0432.CCR-24-1763.

StageTip: a little giant unveiling the potential of mass spectrometry-based proteomics.阶段提示：一个揭示基于质谱的蛋白质组学潜力的小巨人。

Anal Sci. 2025 May;41(5):667-675. doi: 10.1007/s44211-025-00749-1. Epub 2025 Mar 26.

Proteomics and Machine Learning-Based Approach to Decipher Subcellular Proteome of Mouse Heart.基于蛋白质组学和机器学习的方法解析小鼠心脏亚细胞蛋白质组

Mol Cell Proteomics. 2025 Apr;24(4):100952. doi: 10.1016/j.mcpro.2025.100952. Epub 2025 Mar 18.

Integrated Proteomics and Machine Learning Approach Reveals PYCR1 as a Novel Biomarker to Predict Prognosis of Sinonasal Squamous Cell Carcinoma.整合蛋白质组学与机器学习方法揭示PYCR1作为预测鼻窦鳞状细胞癌预后的新型生物标志物。

Int J Mol Sci. 2024 Dec 10;25(24):13234. doi: 10.3390/ijms252413234.

Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases.机器学习在基于质谱的甲状腺疾病多组学中的应用。

Front Mol Biosci. 2024 Dec 17;11:1483326. doi: 10.3389/fmolb.2024.1483326. eCollection 2024.

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.AutoXAI4Omics：用于组学和表格数据的自动化可解释 AI 工具。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae593.

The Circulating Proteome─Technological Developments, Current Challenges, and Future Trends.循环蛋白质组——技术发展、当前挑战与未来趋势

J Proteome Res. 2024 Dec 6;23(12):5279-5295. doi: 10.1021/acs.jproteome.4c00586. Epub 2024 Oct 31.

本文引用的文献

AlphaPept: a modern and open framework for MS-based proteomics.AlphaPept：基于 MS 的蛋白质组学的现代开放框架。

Nat Commun. 2024 Mar 9;15(1):2168. doi: 10.1038/s41467-024-46485-4.

How (Not) to Generate a Highly Predictive Biomarker Panel Using Machine Learning.如何（不）使用机器学习生成高度可预测的生物标志物面板。

J Proteome Res. 2022 Sep 2;21(9):2071-2074. doi: 10.1021/acs.jproteome.2c00117. Epub 2022 Aug 25.

Proteome profiling of cerebrospinal fluid reveals biomarker candidates for Parkinson's disease.脑脊液蛋白质组谱分析揭示帕金森病的生物标志物候选物。

Cell Rep Med. 2022 Jun 21;3(6):100661. doi: 10.1016/j.xcrm.2022.100661.

Noninvasive proteomic biomarkers for alcohol-related liver disease.用于酒精性肝病的非侵入性蛋白质组学生物标志物。

Nat Med. 2022 Jun;28(6):1277-1287. doi: 10.1038/s41591-022-01850-y. Epub 2022 Jun 2.

Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics.对 DOME 推荐用于蛋白质组学和代谢组学中的机器学习的解读。

J Proteome Res. 2022 Apr 1;21(4):1204-1207. doi: 10.1021/acs.jproteome.1c00900. Epub 2022 Feb 4.

High-resolution serum proteome trajectories in COVID-19 reveal patient-specific seroconversion.高分辨率血清蛋白质组轨迹在 COVID-19 中揭示了患者特异性的血清转化。

EMBO Mol Med. 2021 Aug 9;13(8):e14167. doi: 10.15252/emmm.202114167. Epub 2021 Jul 7.

A time-resolved proteomic and prognostic map of COVID-19.COVID-19 的时分辨证蛋白质组学和预后图谱。

Cell Syst. 2021 Aug 18;12(8):780-794.e7. doi: 10.1016/j.cels.2021.05.005. Epub 2021 Jun 14.

Reproducibility in machine learning for health research: Still a ways to go.机器学习在健康研究中的可重复性：仍有很长的路要走。

Sci Transl Med. 2021 Mar 24;13(586). doi: 10.1126/scitranslmed.abb1655.

Urinary proteome profiling for stratifying patients with familial Parkinson's disease.尿蛋白质组谱分析对家族性帕金森病患者的分层。

EMBO Mol Med. 2021 Mar 5;13(3):e13257. doi: 10.15252/emmm.202013257. Epub 2021 Jan 22.

Array programming with NumPy.使用 NumPy 进行数组编程。

Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从蛋白质组学和组学数据中发现生物标志物的机器学习透明探索。

Transparent Exploration of Machine Learning for Biomarker Discovery from Proteomics and Omics Data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献