JESTR：用于对非靶向代谢组学数据注释的候选分子进行排序的联合嵌入空间技术。

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data.

作者信息

Kalia Apurva, Krishnan Dilip, Hassoun Soha

机构信息

Department of Computer Science, Tufts University, Medford, MA 02155, USA.

Google Research.

出版信息

ArXiv. 2024 Nov 25:arXiv:2411.14464v2.

PMID:39606728

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601792/

Abstract

MOTIVATION

A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low.

RESULTS

We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6% - 71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model's ability to discern between target and candidate molecules. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.

AVAILABILITY

Code and dataset available at https://github.com/HassounLab/JESTR1/.

摘要

动机

代谢组学中的一个主要挑战是注释：为质谱碎裂模式分配分子结构。尽管最近在分子到光谱以及光谱到分子指纹预测（FP）方面取得了进展，但注释率仍然很低。

结果

我们在本文中引入了一种用于注释的新范式（JESTR）。与先前构建分子指纹或光谱的方法不同，JESTR利用了分子及其相应光谱是同一数据的不同视图这一见解，并有效地将它们的表示嵌入到一个联合空间中。候选结构根据查询光谱与每个候选结构的嵌入之间的余弦相似度进行排序。我们在三个数据集上针对分子到光谱和光谱到指纹注释工具对JESTR进行了评估。平均而言，对于排名@[1 - 5]，JESTR比其他工具的性能高出23.6% - 71.6%。我们进一步证明了在训练期间使用候选分子进行正则化的强大价值，将排名@1的性能提高了11.4%，并增强了模型区分目标分子和候选分子的能力。通过JESTR，我们提供了一条通往准确注释的有前途的新途径，从而揭示了代谢组中有价值的见解。

可用性

代码和数据集可在https://github.com/HassounLab/JESTR1/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9169/11601792/9c704e474483/nihpp-2411.14464v2-f0001.jpg

相似文献

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data.JESTR：用于对非靶向代谢组学数据注释的候选分子进行排序的联合嵌入空间技术。

ArXiv. 2024 Nov 25:arXiv:2411.14464v2.

JESTR: Joint Embedding Space Technique for Ranking candidate molecules for the annotation of untargeted metabolomics data.JESTR：用于对非靶向代谢组学数据注释的候选分子进行排名的联合嵌入空间技术。

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf354.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状荟萃分析。

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Sparse-view spectral CT reconstruction via a coupled subspace representation and score-based generative model.基于耦合子空间表示和基于分数的生成模型的稀疏视图光谱CT重建

Quant Imaging Med Surg. 2025 Jun 6;15(6):5474-5495. doi: 10.21037/qims-24-2226. Epub 2025 May 28.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Pharmacological treatments in panic disorder in adults: a network meta-analysis.成人惊恐障碍的药物治疗：网络荟萃分析。

Cochrane Database Syst Rev. 2023 Nov 28;11(11):CD012729. doi: 10.1002/14651858.CD012729.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

本文引用的文献

Molecular Structure Discovery for Untargeted Metabolomics Using Biotransformation Rules and Global Molecular Networking.利用生物转化规则和全局分子网络进行非靶向代谢组学的分子结构发现

Anal Chem. 2025 Feb 18;97(6):3213-3219. doi: 10.1021/acs.analchem.4c01565. Epub 2025 Feb 4.

CMSSP: A Contrastive Mass Spectra-Structure Pretraining Model for Metabolite Identification.CMSSP：一种用于代谢物鉴定的对比质谱-结构预训练模型。

Anal Chem. 2024 Oct 22;96(42):16871-16881. doi: 10.1021/acs.analchem.4c03724. Epub 2024 Oct 14.

MSBERT: Embedding Tandem Mass Spectra into Chemically Rational Space by Mask Learning and Contrastive Learning.MSBERT：通过掩码学习和对比学习将串联质谱嵌入化学合理空间

Anal Chem. 2024 Oct 22;96(42):16599-16608. doi: 10.1021/acs.analchem.4c02426. Epub 2024 Oct 14.

An Ensemble Spectral Prediction (ESP) model for metabolite annotation.用于代谢物注释的集成谱预测 (ESP) 模型。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae490.

ipaPy2: Integrated Probabilistic Annotation (IPA) 2.0-an improved Bayesian-based method for the annotation of LC-MS/MS untargeted metabolomics data.ipaPy2：集成概率标注（IPA）2.0——一种改进的基于贝叶斯的 LC-MS/MS 非靶向代谢组学数据标注方法。

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad455.

An end-to-end deep learning framework for translating mass spectra to de-novo molecules.一种用于将质谱图翻译为从头合成分子的端到端深度学习框架。

Commun Chem. 2023 Jun 23;6(1):132. doi: 10.1038/s42004-023-00932-3.

MS2Query: reliable and scalable MS mass spectra-based analogue search.MS2Query：可靠且可扩展的基于 MS 质谱的模拟搜索。

Nat Commun. 2023 Mar 29;14(1):1752. doi: 10.1038/s41467-023-37446-4.

MSNovelist: de novo structure generation from mass spectra.MSNovelist：从头开始从质谱生成结构。

Nat Methods. 2022 Jul;19(7):865-870. doi: 10.1038/s41592-022-01486-3. Epub 2022 May 30.

MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra.MS2DeepScore：一种用于比较串联质谱的新型深度学习相似性度量方法。

J Cheminform. 2021 Oct 29;13(1):84. doi: 10.1186/s13321-021-00558-4.

CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification.CFM-ID 4.0：更准确的 ESI-MS/MS 谱预测和化合物鉴定。

Anal Chem. 2021 Aug 31;93(34):11692-11700. doi: 10.1021/acs.analchem.1c01465. Epub 2021 Aug 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

JESTR：用于对非靶向代谢组学数据注释的候选分子进行排序的联合嵌入空间技术。

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献