Suppr超能文献

JESTR:用于对非靶向代谢组学数据注释的候选分子进行排序的联合嵌入空间技术。

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data.

作者信息

Kalia Apurva, Krishnan Dilip, Hassoun Soha

机构信息

Department of Computer Science, Tufts University, Medford, MA 02155, USA.

Google Research.

出版信息

ArXiv. 2024 Nov 25:arXiv:2411.14464v2.

Abstract

MOTIVATION

A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low.

RESULTS

We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6% - 71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model's ability to discern between target and candidate molecules. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.

AVAILABILITY

Code and dataset available at https://github.com/HassounLab/JESTR1/.

摘要

动机

代谢组学中的一个主要挑战是注释:为质谱碎裂模式分配分子结构。尽管最近在分子到光谱以及光谱到分子指纹预测(FP)方面取得了进展,但注释率仍然很低。

结果

我们在本文中引入了一种用于注释的新范式(JESTR)。与先前构建分子指纹或光谱的方法不同,JESTR利用了分子及其相应光谱是同一数据的不同视图这一见解,并有效地将它们的表示嵌入到一个联合空间中。候选结构根据查询光谱与每个候选结构的嵌入之间的余弦相似度进行排序。我们在三个数据集上针对分子到光谱和光谱到指纹注释工具对JESTR进行了评估。平均而言,对于排名@[1 - 5],JESTR比其他工具的性能高出23.6% - 71.6%。我们进一步证明了在训练期间使用候选分子进行正则化的强大价值,将排名@1的性能提高了11.4%,并增强了模型区分目标分子和候选分子的能力。通过JESTR,我们提供了一条通往准确注释的有前途的新途径,从而揭示了代谢组中有价值的见解。

可用性

代码和数据集可在https://github.com/HassounLab/JESTR1/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9169/11601792/9c704e474483/nihpp-2411.14464v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验