一种用于高效联合分析数百万个质谱的深度学习嵌入方法。

A learned embedding for efficient joint analysis of millions of mass spectra.

机构信息

Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, La Jolla, CA, USA.

Department of Genome Sciences, University of Washington, Seattle, WA, USA.

出版信息

Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.

DOI:10.1038/s41592-022-01496-1

PMID:35637305

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9189069/

Abstract

Computational methods that aim to exploit publicly available mass spectrometry repositories rely primarily on unsupervised clustering of spectra. Here we trained a deep neural network in a supervised fashion on the basis of previous assignments of peptides to spectra. The network, called 'GLEAMS', learns to embed spectra in a low-dimensional space in which spectra generated by the same peptide are close to one another. We applied GLEAMS for large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. We used these clusters to explore the dark proteome of repeatedly observed yet consistently unidentified mass spectra.

摘要

旨在利用公共质谱数据库的计算方法主要依赖于对光谱的无监督聚类。在这里，我们在先前将肽分配给光谱的基础上，以监督方式训练深度神经网络。该网络称为“GLEAMS”，它学习将光谱嵌入到低维空间中，在该空间中，由相同肽生成的光谱彼此靠近。我们应用 GLEAMS 进行大规模光谱聚类，检测代表相同肽的未识别的相邻光谱的组。我们使用这些聚类来探索反复观察但始终未识别的质谱的暗蛋白质组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c01/9189069/5a9f6169892d/nihms-1798750-f0003.jpg

相似文献

A learned embedding for efficient joint analysis of millions of mass spectra.一种用于高效联合分析数百万个质谱的深度学习嵌入方法。

Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra.光谱档案：扩展光谱库以分析已识别和未识别的光谱。

Nat Methods. 2011 May 15;8(7):587-91. doi: 10.1038/nmeth.1609.

Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.串联质谱数据聚类算法的比较与评估。

J Proteome Res. 2017 Nov 3;16(11):4035-4044. doi: 10.1021/acs.jproteome.7b00427.

Deep Convolutional Neural Networks Help Scoring Tandem Mass Spectrometry Data in Database-Searching Approaches.深度学习卷积神经网络有助于在数据库检索方法中对串联质谱数据进行评分。

J Proteome Res. 2021 Oct 1;20(10):4708-4717. doi: 10.1021/acs.jproteome.1c00315. Epub 2021 Aug 27.

Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.Prosit：基于深度学习的肽串联质谱的蛋白质组范围预测。

Nat Methods. 2019 Jun;16(6):509-518. doi: 10.1038/s41592-019-0426-7. Epub 2019 May 27.

MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.MS2CNN：基于深度卷积神经网络的蛋白质序列预测 MS/MS 谱。

BMC Genomics. 2019 Dec 24;20(Suppl 9):906. doi: 10.1186/s12864-019-6297-6.

Unsupervised convolutional variational autoencoder deep embedding clustering for Raman spectra.无监督卷积变分自动编码器深度嵌入聚类用于拉曼光谱。

Anal Methods. 2022 Oct 13;14(39):3898-3910. doi: 10.1039/d2ay01184k.

Model based clustering for tandem mass spectrum quality assessment.基于模型的串联质谱质量评估聚类分析

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:6747-50. doi: 10.1109/IEMBS.2009.5332499.

ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.ClusterSheep：一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。

J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.

引用本文的文献

TopLib: Building and Searching Top-Down Mass Spectral Libraries for Proteoform Identification.TopLib：构建和搜索自上而下的质谱库以进行蛋白质异构体鉴定。

Anal Chem. 2025 Jun 10;97(22):11443-11453. doi: 10.1021/acs.analchem.4c06627. Epub 2025 May 29.

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS.使用DreaMS从数百万个串联质谱中进行分子表征的自监督学习。

Nat Biotechnol. 2025 May 23. doi: 10.1038/s41587-025-02663-3.

Self-supervised learning from small-molecule mass spectrometry data.从小分子质谱数据中进行自监督学习。

Nat Biotechnol. 2025 May 23. doi: 10.1038/s41587-025-02677-x.

Proteomics Can Rise to the Challenge of Pseudogenes' Coding Nature.蛋白质组学能够应对假基因编码特性带来的挑战。

J Proteome Res. 2024 Dec 6;23(12):5233-5249. doi: 10.1021/acs.jproteome.4c00116. Epub 2024 Nov 1.

Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain.探索免疫肽组学的动态格局：揭示翻译后修饰并跨越生物信息学领域。

Mass Spectrom Rev. 2025 Jul-Aug;44(4):599-629. doi: 10.1002/mas.21905. Epub 2024 Aug 16.

Sequence-to-sequence translation from mass spectra to peptides with a transformer model.基于 Transformer 模型的从质谱到肽的序列到序列翻译。

Nat Commun. 2024 Jul 30;15(1):6427. doi: 10.1038/s41467-024-49731-x.

SpecEncoder: deep metric learning for accurate peptide identification in proteomics.SpecEncoder：用于蛋白质组学中精确肽段鉴定的深度度量学习。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.

Spectra without stories: reporting 94% dark and unidentified ancient proteomes.没有故事的光谱：报道94%的黑暗且身份不明的古代蛋白质组。

Open Res Eur. 2024 Apr 15;4:71. doi: 10.12688/openreseurope.17225.1. eCollection 2024.

Spectroscape enables real-time query and visualization of a spectral archive in proteomics.Spectroscape 能够实时查询和可视化蛋白质组学中的光谱档案。

Nat Commun. 2023 Oct 7;14(1):6267. doi: 10.1038/s41467-023-42006-x.

HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space.超高维空间中的超快质谱聚类分析

J Proteome Res. 2023 Jun 2;22(6):1639-1648. doi: 10.1021/acs.jproteome.2c00612. Epub 2023 May 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于高效联合分析数百万个质谱的深度学习嵌入方法。

A learned embedding for efficient joint analysis of millions of mass spectra.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献