Kalia Apurva, Krishnan Dilip, Hassoun Soha
Department of Computer Science, Tufts University, Medford, MA 02155, USA.
Google Research.
ArXiv. 2024 Nov 25:arXiv:2411.14464v2.
A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low.
We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6% - 71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model's ability to discern between target and candidate molecules. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome.
Code and dataset available at https://github.com/HassounLab/JESTR1/.
代谢组学中的一个主要挑战是注释:为质谱碎裂模式分配分子结构。尽管最近在分子到光谱以及光谱到分子指纹预测(FP)方面取得了进展,但注释率仍然很低。
我们在本文中引入了一种用于注释的新范式(JESTR)。与先前构建分子指纹或光谱的方法不同,JESTR利用了分子及其相应光谱是同一数据的不同视图这一见解,并有效地将它们的表示嵌入到一个联合空间中。候选结构根据查询光谱与每个候选结构的嵌入之间的余弦相似度进行排序。我们在三个数据集上针对分子到光谱和光谱到指纹注释工具对JESTR进行了评估。平均而言,对于排名@[1 - 5],JESTR比其他工具的性能高出23.6% - 71.6%。我们进一步证明了在训练期间使用候选分子进行正则化的强大价值,将排名@1的性能提高了11.4%,并增强了模型区分目标分子和候选分子的能力。通过JESTR,我们提供了一条通往准确注释的有前途的新途径,从而揭示了代谢组中有价值的见解。