• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LC-MS 实验因素的可变性分析及其对机器学习的影响。

Variability analysis of LC-MS experimental factors and their impact on machine learning.

机构信息

Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.

Department of Chemistry and Bioscience, Aalborg University, 9220 Aalborg, Denmark.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad096. Epub 2023 Nov 20.

DOI:10.1093/gigascience/giad096
PMID:37983748
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10659119/
Abstract

BACKGROUND

Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for training and repurposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs.

RESULTS

We aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variability in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning.

CONCLUSIONS

Our findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it is important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pretrained model.

摘要

背景

机器学习(ML)技术,特别是深度学习(DL),在预测性质谱(MS)中越来越受到关注,用于增强从原始数据分析到最终用户预测和重新评分的数据处理管道。ML 模型需要大规模数据集进行训练和重新使用,可以从一系列公共数据存储库中获得。然而,将 ML 应用于更大规模的公共 MS 数据集具有挑战性,因为它们在数据采集方法、生物系统和实验设计方面差异很大。

结果

我们旨在通过对公共 MS 存储库中潜在变异性的来源进行系统分析,促进 MS 数据中的 ML 工作。我们还研究了这些因素如何影响 ML 性能,并进行全面的迁移学习,以评估该领域当前最佳实践方法在迁移学习方面的优势。

结论

我们的研究结果表明,项目内的同质性明显高于项目之间的同质性,这表明构建最接近未来测试用例的数据集非常重要,因为对于看不见的数据集,可转移性受到严重限制。我们还发现,尽管迁移学习确实提高了模型性能,但与非预训练模型相比,它并没有提高模型性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/83da884e74af/giad096fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/81d545dfed94/giad096fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/01f352c37d04/giad096fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/0e1debb2d31c/giad096fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/5c28770a8ab4/giad096fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/6648fb39644e/giad096fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/b2fe557f3df9/giad096fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/39758f32b2ae/giad096fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/91b168035894/giad096fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/83da884e74af/giad096fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/81d545dfed94/giad096fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/01f352c37d04/giad096fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/0e1debb2d31c/giad096fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/5c28770a8ab4/giad096fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/6648fb39644e/giad096fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/b2fe557f3df9/giad096fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/39758f32b2ae/giad096fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/91b168035894/giad096fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/10659119/83da884e74af/giad096fig9.jpg

相似文献

1
Variability analysis of LC-MS experimental factors and their impact on machine learning.LC-MS 实验因素的可变性分析及其对机器学习的影响。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad096. Epub 2023 Nov 20.
2
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
3
Machine Learning in Mass Spectrometric Analysis of DIA Data.机器学习在 DIA 数据的质谱分析中的应用。
Proteomics. 2020 Nov;20(21-22):e1900352. doi: 10.1002/pmic.201900352. Epub 2020 Mar 4.
4
MSTracer: A Machine Learning Software Tool for Peptide Feature Detection from Liquid Chromatography-Mass Spectrometry Data.MSTracer:一种用于从液相色谱-质谱数据中检测肽特征的机器学习软件工具。
J Proteome Res. 2021 Jul 2;20(7):3455-3462. doi: 10.1021/acs.jproteome.0c01029. Epub 2021 Jun 17.
5
The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows.数据驱动的蛋白质组学时代:机器学习如何实现新的工作流程。
Proteomics. 2020 Nov;20(21-22):e1900351. doi: 10.1002/pmic.201900351. Epub 2020 Jun 25.
6
MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.MS2AI:用于机器学习应用的公共肽段液相色谱-质谱数据的自动重新利用。
Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.
7
Transfer learning for auto-segmentation of 17 organs-at-risk in the head and neck: Bridging the gap between institutional and public datasets.基于迁移学习的头颈部 17 个危及器官自动分割:弥合机构数据集和公共数据集之间的差距。
Med Phys. 2024 Jul;51(7):4767-4777. doi: 10.1002/mp.16997. Epub 2024 Feb 20.
8
Fast and Accurate Bacterial Species Identification in Urine Specimens Using LC-MS/MS Mass Spectrometry and Machine Learning.利用 LC-MS/MS 质谱联用和机器学习技术快速准确鉴定尿液样本中的细菌种属。
Mol Cell Proteomics. 2019 Dec;18(12):2492-2505. doi: 10.1074/mcp.TIR119.001559. Epub 2019 Oct 4.
9
Machine learning pipeline to analyze clinical and proteomics data: experiences on a prostate cancer case.机器学习分析临床和蛋白质组学数据的流程:前列腺癌案例的经验。
BMC Med Inform Decis Mak. 2024 Apr 8;24(1):93. doi: 10.1186/s12911-024-02491-6.
10
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

引用本文的文献

1
Blood proteome profiling for biomarker discovery in broilers with necrotic enteritis.用于坏死性肠炎肉鸡生物标志物发现的血液蛋白质组分析
Sci Rep. 2025 Apr 15;15(1):12895. doi: 10.1038/s41598-025-97783-w.
2
Recent Advances in Mass Spectrometry-Based Bottom-Up Proteomics.基于质谱的自下而上蛋白质组学的最新进展
Anal Chem. 2025 Mar 11;97(9):4728-4749. doi: 10.1021/acs.analchem.4c06750. Epub 2025 Feb 25.
3
Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning.基于自监督深度学习的无标签定量蛋白质组学数据的推断。

本文引用的文献

1
Insight on physicochemical properties governing peptide MS1 response in HPLC-ESI-MS/MS: A deep learning approach.关于HPLC-ESI-MS/MS中控制肽段MS1响应的物理化学性质的见解:一种深度学习方法。
Comput Struct Biotechnol J. 2023 Jul 22;21:3715-3727. doi: 10.1016/j.csbj.2023.07.027. eCollection 2023.
2
Toward an Integrated Machine Learning Model of a Proteomics Experiment.迈向蛋白质组学实验的集成机器学习模型。
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
3
ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.
Nat Commun. 2024 Jun 26;15(1):5405. doi: 10.1038/s41467-024-48711-5.
4
Post-translational modifications of proteins in cardiovascular diseases examined by proteomic approaches.通过蛋白质组学方法检测心血管疾病中蛋白质的翻译后修饰
FEBS J. 2025 Jan;292(1):28-46. doi: 10.1111/febs.17108. Epub 2024 Mar 5.
蛋白质组学 ML:一个在线平台,用于社区策划的数据集和蛋白质组学机器学习教程。
J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.
4
MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications.MS2AI:用于机器学习应用的公共肽段液相色谱-质谱数据的自动重新利用。
Bioinformatics. 2022 Jan 12;38(3):875-877. doi: 10.1093/bioinformatics/btab701.
5
MaxDIA enables library-based and library-free data-independent acquisition proteomics.MaxDIA支持基于文库和无文库的数据非依赖型采集蛋白质组学。
Nat Biotechnol. 2021 Dec;39(12):1563-1573. doi: 10.1038/s41587-021-00968-7. Epub 2021 Jul 8.
6
Software Options for the Analysis of MS-Proteomic Data.MS 蛋白质组学数据分析的软件选项。
Methods Mol Biol. 2021;2361:35-59. doi: 10.1007/978-1-0716-1641-3_3.
7
Deep learning the collisional cross sections of the peptide universe from a million experimental values.从一百万个实验值中深度学习肽宇宙的碰撞截面。
Nat Commun. 2021 Feb 19;12(1):1185. doi: 10.1038/s41467-021-21352-8.
8
Deep Learning in Proteomics.蛋白质组学中的深度学习。
Proteomics. 2020 Nov;20(21-22):e1900335. doi: 10.1002/pmic.201900335. Epub 2020 Oct 30.
9
Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics.提示:用于无靶标代谢组学中化合物注释的保留时间预测。
Anal Chem. 2020 Jun 2;92(11):7515-7522. doi: 10.1021/acs.analchem.9b05765. Epub 2020 May 21.
10
Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network.使用深度神经网络进行肽串联质谱的全谱预测。
Anal Chem. 2020 Mar 17;92(6):4275-4283. doi: 10.1021/acs.analchem.9b04867. Epub 2020 Feb 25.