光谱熵在小分子化合物鉴定方面优于 MS/MS 点积相似度。

Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification.

机构信息

West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA, USA.

Olobion, Parc Científic de Barcelona, Barcelona, Spain.

出版信息

Nat Methods. 2021 Dec;18(12):1524-1531. doi: 10.1038/s41592-021-01331-z. Epub 2021 Dec 2.

DOI:10.1038/s41592-021-01331-z

PMID:34857935

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11492813/

Abstract

Compound identification in small-molecule research, such as untargeted metabolomics or exposome research, relies on matching tandem mass spectrometry (MS/MS) spectra against experimental or in silico mass spectral libraries. Most software programs use dot product similarity scores. Here we introduce the concept of MS/MS spectral entropy to improve scoring results in MS/MS similarity searches via library matching. Entropy similarity outperformed 42 alternative similarity algorithms, including dot product similarity, when searching 434,287 spectra against the high-quality NIST20 library. Entropy similarity scores proved to be highly robust even when we added different levels of noise ions. When we applied entropy levels to 37,299 experimental spectra of natural products, false discovery rates of less than 10% were observed at entropy similarity score 0.75. Experimental human gut metabolome data were used to confirm that entropy similarity largely improved the accuracy of MS-based annotations in small-molecule research to false discovery rates below 10%, annotated new compounds and provided the basis to automatically flag poor-quality, noisy spectra.

摘要

在小分子研究（如非靶向代谢组学或暴露组学研究）中，化合物鉴定依赖于将串联质谱（MS/MS）谱与实验或计算质谱谱库进行匹配。大多数软件程序使用点积相似度得分。在这里，我们引入 MS/MS 光谱熵的概念，通过库匹配来提高 MS/MS 相似度搜索中的评分结果。在对高质量 NIST20 库进行 434,287 次光谱搜索时，熵相似度优于包括点积相似度在内的 42 种替代相似度算法。即使在添加不同水平的噪声离子时，熵相似度得分也被证明具有高度的稳健性。当我们将熵水平应用于 37,299 种天然产物的实验光谱时，在熵相似度得分 0.75 时，假发现率低于 10%。我们使用实验性人类肠道代谢组学数据来证实，熵相似度极大地提高了基于 MS 的小分子研究中注释的准确性，假发现率低于 10%，注释了新的化合物，并为自动标记低质量、噪声光谱提供了基础。

相似文献

Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification.光谱熵在小分子化合物鉴定方面优于 MS/MS 点积相似度。

Nat Methods. 2021 Dec;18(12):1524-1531. doi: 10.1038/s41592-021-01331-z. Epub 2021 Dec 2.

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.Spec2Vec：通过学习结构关系提高质谱相似性评分。

PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.

Metabolomic spectral libraries for data-independent SWATH liquid chromatography mass spectrometry acquisition.用于数据非依赖型SWATH液相色谱质谱采集的代谢组学光谱库。

Anal Bioanal Chem. 2018 Mar;410(7):1873-1884. doi: 10.1007/s00216-018-0860-x. Epub 2018 Feb 6.

Retrieving and Utilizing Hypothetical Neutral Losses from Tandem Mass Spectra for Spectral Similarity Analysis and Unknown Metabolite Annotation.从串联质谱中检索和利用假设中性损失进行光谱相似性分析和未知代谢物注释。

Anal Chem. 2020 Nov 3;92(21):14476-14483. doi: 10.1021/acs.analchem.0c02521. Epub 2020 Oct 19.

[A novel method for efficient screening and annotation of important pathway-associated metabolites based on the modified metabolome and probe molecules].一种基于改良代谢组和探针分子的重要通路相关代谢物高效筛选与注释新方法

Se Pu. 2022 Sep;40(9):788-796. doi: 10.3724/SP.J.1123.2022.03025.

Methods to Calculate Spectrum Similarity.计算光谱相似度的方法。

Methods Mol Biol. 2017;1549:75-100. doi: 10.1007/978-1-4939-6740-7_7.

Flash entropy search to query all mass spectral libraries in real time.实时查询所有质谱文库的 Flash 熵搜索。

Nat Methods. 2023 Oct;20(10):1475-1478. doi: 10.1038/s41592-023-02012-9. Epub 2023 Sep 21.

Customized Consensus Spectral Library Building for Untargeted Quantitative Metabolomics Analysis with Data Independent Acquisition Mass Spectrometry and MetaboDIA Workflow.基于数据非依赖采集质谱和 MetaboDIA 工作流程的靶向定量代谢组学分析的定制共识谱库构建。

Anal Chem. 2017 May 2;89(9):4897-4906. doi: 10.1021/acs.analchem.6b05006. Epub 2017 Apr 18.

Offline Two-Dimensional Liquid Chromatography-Mass Spectrometry for Deep Annotation of the Fecal Metabolome Following Fecal Microbiota Transplantation.基于粪便微生物群移植的粪便代谢组学深度注释的离线二维液相色谱-质谱法。

J Proteome Res. 2024 Jun 7;23(6):2000-2012. doi: 10.1021/acs.jproteome.4c00022. Epub 2024 May 16.

Significance estimation for large scale metabolomics annotations by spectral matching.基于谱图匹配的大规模代谢组学注释的显著性估计。

Nat Commun. 2017 Nov 14;8(1):1494. doi: 10.1038/s41467-017-01318-5.

引用本文的文献

Machine learning- and multilayer molecular network-assisted screening hunts fentanyl compounds.机器学习和多层分子网络辅助筛选寻找芬太尼化合物。

Sci Adv. 2025 Sep 5;11(36):eadw2799. doi: 10.1126/sciadv.adw2799.

Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows.用于提高反相液相色谱/高分辨率质谱非靶向工作流程中鉴定概率的机器学习

Anal Chem. 2025 Aug 26;97(33):18028-18035. doi: 10.1021/acs.analchem.5c01873. Epub 2025 Aug 12.

An evaluation methodology for machine learning-based tandem mass spectra similarity prediction.一种基于机器学习的串联质谱相似性预测评估方法。

BMC Bioinformatics. 2025 Jul 11;26(1):174. doi: 10.1186/s12859-025-06194-1.

MassCube improves accuracy for metabolomics data processing from raw files to phenotype classifiers.MassCube提高了从原始文件到表型分类器的代谢组学数据处理的准确性。

Nat Commun. 2025 Jul 1;16(1):5487. doi: 10.1038/s41467-025-60640-5.

Longitudinal Fragment Profiles Based on Multi-Collision Energy Tandem Mass Spectra Improve the Accuracy of Metabolite Identification in Untargeted Metabolomics.基于多碰撞能量串联质谱的纵向碎片谱图提高了非靶向代谢组学中代谢物鉴定的准确性。

Anal Chem. 2025 Jul 15;97(27):14349-14360. doi: 10.1021/acs.analchem.5c01414. Epub 2025 Jul 1.

Untargeted analysis of hydrophilic metabolites using enhanced LC-MS separation with a pentafluoro phenylpropyl-functionalized column and prediction-based MS/MS spectrum annotation.使用五氟苯丙基功能化柱进行增强型液相色谱-质谱分离对亲水性代谢物进行非靶向分析以及基于预测的二级质谱谱图注释。

Metabolomics. 2025 Jun 14;21(4):79. doi: 10.1007/s11306-025-02272-w.

Neural Spectral Prediction for Structure Elucidation with Tandem Mass Spectrometry.用于串联质谱结构解析的神经光谱预测

bioRxiv. 2025 Jun 1:2025.05.28.656653. doi: 10.1101/2025.05.28.656653.

Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics.基于质谱的代谢组学中用于化合物鉴定的连续相似性度量的比较分析

Chemometr Intell Lab Syst. 2025 Aug 15;263. doi: 10.1016/j.chemolab.2025.105417. Epub 2025 May 3.

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS.使用DreaMS从数百万个串联质谱中进行分子表征的自监督学习。

Nat Biotechnol. 2025 May 23. doi: 10.1038/s41587-025-02663-3.

LipidIN: a comprehensive repository for flash platform-independent annotation and reverse lipidomics.LipidIN：一个用于闪存平台无关注释和反向脂质组学的综合数据库。

Nat Commun. 2025 May 16;16(1):4566. doi: 10.1038/s41467-025-59683-5.

本文引用的文献

Metabolomics analysis of time-series human small intestine lumen samples collected .收集的时间序列人小肠腔样本的代谢组学分析。

Food Funct. 2021 Oct 4;12(19):9405-9415. doi: 10.1039/d1fo01574e.

Ion mobility collision cross-section atlas for known and unknown metabolite annotation in untargeted metabolomics.用于非靶向代谢组学中已知和未知代谢物注释的离子淌度碰撞截面图谱。

Nat Commun. 2020 Aug 28;11(1):4334. doi: 10.1038/s41467-020-18171-8.

"Lipidomics": Mass spectrometric and chemometric analyses of lipids.脂质组学：脂质的质谱分析和化学计量学分析。

Adv Drug Deliv Rev. 2020;159:294-307. doi: 10.1016/j.addr.2020.06.009. Epub 2020 Jun 14.

JUMPm: A Tool for Large-Scale Identification of Metabolites in Untargeted Metabolomics.JUMPm：一种用于非靶向代谢组学中大规模代谢物鉴定的工具。

Metabolites. 2020 May 12;10(5):190. doi: 10.3390/metabo10050190.

Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics.提示：用于无靶标代谢组学中化合物注释的保留时间预测。

Anal Chem. 2020 Jun 2;92(11):7515-7522. doi: 10.1021/acs.analchem.9b05765. Epub 2020 May 21.

The METLIN small molecule dataset for machine learning-based retention time prediction.基于机器学习的保留时间预测的 METLIN 小分子数据集。

Nat Commun. 2019 Dec 20;10(1):5811. doi: 10.1038/s41467-019-13680-7.

ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries.ISiCLE：用于建立计算内碰撞截面库的量子化学管道。

Anal Chem. 2019 Apr 2;91(7):4346-4356. doi: 10.1021/acs.analchem.8b04567. Epub 2019 Mar 6.

Untargeted Molecular Discovery in Primary Metabolism: Collision Cross Section as a Molecular Descriptor in Ion Mobility-Mass Spectrometry.非靶向分子在初级代谢物中的发现：碰撞截面作为离子淌度-质谱中的分子描述符。

Anal Chem. 2018 Dec 18;90(24):14484-14492. doi: 10.1021/acs.analchem.8b04322. Epub 2018 Nov 30.

Liquid-chromatography retention order prediction for metabolite identification.用于代谢物鉴定的液相色谱保留顺序预测。

Bioinformatics. 2018 Sep 1;34(17):i875-i883. doi: 10.1093/bioinformatics/bty590.

Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA.将非靶向分析研究和化学安全评估工具整合到美国环保局中。

J Expo Sci Environ Epidemiol. 2018 Sep;28(5):411-426. doi: 10.1038/s41370-017-0012-y. Epub 2017 Dec 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验