代谢组学中用于非靶向子结构探索的主题建模

Topic modeling for untargeted substructure exploration in metabolomics.

作者信息

van der Hooft Justin Johan Jozias, Wandy Joe, Barrett Michael P, Burgess Karl E V, Rogers Simon

机构信息

Glasgow Polyomics, University of Glasgow, Glasgow G61 1QH, United Kingdom.

Institute of Infection, Immunity, and Inflammation, College of Medical, Veterinary, and Life Sciences, University of Glasgow, Glasgow G12 8TA, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):13738-13743. doi: 10.1073/pnas.1608041113. Epub 2016 Nov 16.

DOI:10.1073/pnas.1608041113

PMID:27856765

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5137707/

Abstract

The potential of untargeted metabolomics to answer important questions across the life sciences is hindered because of a paucity of computational tools that enable extraction of key biochemically relevant information. Available tools focus on using mass spectrometry fragmentation spectra to identify molecules whose behavior suggests they are relevant to the system under study. Unfortunately, fragmentation spectra cannot identify molecules in isolation but require authentic standards or databases of known fragmented molecules. Fragmentation spectra are, however, replete with information pertaining to the biochemical processes present, much of which is currently neglected. Here, we present an analytical workflow that exploits all fragmentation data from a given experiment to extract biochemically relevant features in an unsupervised manner. We demonstrate that an algorithm originally used for text mining, latent Dirichlet allocation, can be adapted to handle metabolomics datasets. Our approach extracts biochemically relevant molecular substructures ("Mass2Motifs") from spectra as sets of co-occurring molecular fragments and neutral losses. The analysis allows us to isolate molecular substructures, whose presence allows molecules to be grouped based on shared substructures regardless of classical spectral similarity. These substructures, in turn, support putative de novo structural annotation of molecules. Combining this spectral connectivity to orthogonal correlations (e.g., common abundance changes under system perturbation) significantly enhances our ability to provide mechanistic explanations for biological behavior.

摘要

由于缺乏能够提取关键生物化学相关信息的计算工具，非靶向代谢组学回答生命科学中重要问题的潜力受到了阻碍。现有工具侧重于使用质谱碎片谱来识别那些行为表明它们与所研究系统相关的分子。不幸的是，碎片谱无法孤立地识别分子，而是需要真实标准品或已知碎片分子的数据库。然而，碎片谱中充满了与当前存在的生化过程相关的信息，其中大部分目前被忽视了。在这里，我们提出了一种分析流程，该流程利用给定实验的所有碎片数据以无监督的方式提取生物化学相关特征。我们证明，一种最初用于文本挖掘的算法——潜在狄利克雷分配算法，可以进行调整以处理代谢组学数据集。我们的方法从光谱中提取生物化学相关的分子子结构（“质量到基序”）作为共现分子片段和中性损失的集合。该分析使我们能够分离出分子子结构，其存在使得分子能够基于共享子结构进行分组，而不管经典的光谱相似性如何。这些子结构反过来支持分子的推定从头结构注释。将这种光谱连通性与正交相关性（例如，系统扰动下的共同丰度变化）相结合，显著增强了我们为生物学行为提供机理解释的能力。

相似文献

Topic modeling for untargeted substructure exploration in metabolomics.代谢组学中用于非靶向子结构探索的主题建模

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):13738-13743. doi: 10.1073/pnas.1608041113. Epub 2016 Nov 16.

Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics.无监督发现和比较靶向代谢组学中多个样本中的结构家族。

Anal Chem. 2017 Jul 18;89(14):7569-7577. doi: 10.1021/acs.analchem.7b01391. Epub 2017 Jul 5.

Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra.通过无监督和有监督的子结构发现以及从 MS/MS 光谱进行半自动注释来破译复杂代谢物混合物。

Faraday Discuss. 2019 Aug 15;218(0):284-302. doi: 10.1039/c8fd00235e.

METLIN: A Tandem Mass Spectral Library of Standards.METLIN：一个标准串联质谱库。

Methods Mol Biol. 2020;2104:149-163. doi: 10.1007/978-1-0716-0239-3_9.

Autonomous METLIN-Guided In-source Fragment Annotation for Untargeted Metabolomics.自主 METLIN 引导的内源性碎片注释用于非靶向代谢组学。

Anal Chem. 2019 Mar 5;91(5):3246-3253. doi: 10.1021/acs.analchem.8b03126. Epub 2019 Feb 11.

MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra.MESSAR：串联质谱中代谢物亚结构的自动推荐。

PLoS One. 2020 Jan 16;15(1):e0226770. doi: 10.1371/journal.pone.0226770. eCollection 2020.

Investigation of the chemical compounds in Pheretima aspergillum (E. Perrier) using a combination of mass spectral molecular networking and unsupervised substructure annotation topic modeling together with in silico fragmentation prediction.结合质谱分子网络和无监督子结构注释主题建模以及计算机辅助碎片预测，对参环毛蚓中的化合物进行研究。

J Pharm Biomed Anal. 2020 May 30;184:113197. doi: 10.1016/j.jpba.2020.113197. Epub 2020 Feb 20.

Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches.利用基于子结构和网络的计算代谢组学方法分解复杂代谢物混合物的进展。

Nat Prod Rep. 2021 Nov 17;38(11):1967-1993. doi: 10.1039/d1np00023c.

Customized Consensus Spectral Library Building for Untargeted Quantitative Metabolomics Analysis with Data Independent Acquisition Mass Spectrometry and MetaboDIA Workflow.基于数据非依赖采集质谱和 MetaboDIA 工作流程的靶向定量代谢组学分析的定制共识谱库构建。

Anal Chem. 2017 May 2;89(9):4897-4906. doi: 10.1021/acs.analchem.6b05006. Epub 2017 Apr 18.

compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.compMS2Miner：一个用于高分辨 LC-MS 数据集的自动化代谢物鉴定、可视化和数据共享 R 包。

Anal Chem. 2017 Apr 4;89(7):3919-3928. doi: 10.1021/acs.analchem.6b02394. Epub 2017 Mar 27.

引用本文的文献

Knowledge and data-driven two-layer networking for accurate metabolite annotation in untargeted metabolomics.用于非靶向代谢组学中准确代谢物注释的知识和数据驱动的双层网络

Nat Commun. 2025 Aug 30;16(1):8118. doi: 10.1038/s41467-025-63536-6.

Automated annotation of complex natural products using a modular fragmentation-based structure assembly (MFSA) strategy.使用基于模块化碎片化的结构组装（MFSA）策略对复杂天然产物进行自动注释。

Sci Adv. 2025 Aug 15;11(33):eadw4693. doi: 10.1126/sciadv.adw4693.

Analysis of plant metabolomics data using identification-free approaches.使用无鉴定方法分析植物代谢组学数据。

Appl Plant Sci. 2025 Mar 1;13(4):e70001. doi: 10.1002/aps3.70001. eCollection 2025 Jul-Aug.

Contact- and Water-Mediated Interactions With an Allelopathic Macroalga Drive Distinct Coral Microbiome and Metabolome.与化感大型海藻的接触介导和水介导相互作用驱动独特的珊瑚微生物组和代谢组。

Environ Microbiol. 2025 Aug;27(8):e70160. doi: 10.1111/1462-2920.70160.

MEANtools integrates multi-omics data to identify metabolites and predict biosynthetic pathways.MEANtools整合多组学数据以识别代谢物并预测生物合成途径。

PLoS Biol. 2025 Jul 28;23(7):e3003307. doi: 10.1371/journal.pbio.3003307. eCollection 2025 Jul.

mineMS2: annotation of spectral libraries with exact fragmentation patterns.mineMS2：使用精确的碎片模式对光谱库进行注释。

J Cheminform. 2025 Jul 24;17(1):111. doi: 10.1186/s13321-025-01051-y.

Bridging Ethnobotanical Knowledge and Multi-Omics Approaches for Plant-Derived Natural Product Discovery.架起民族植物学知识与多组学方法之间的桥梁以发现植物源天然产物

Metabolites. 2025 May 29;15(6):362. doi: 10.3390/metabo15060362.

Quorum Sensing and Mobility Inhibition of Pathogenic Bacteria by sp. nov.新型[细菌名称]对病原菌的群体感应及运动抑制作用

Molecules. 2025 May 22;30(11):2278. doi: 10.3390/molecules30112278.

Secretion of antifungal metabolites contributes to the antagonistic activity of .抗真菌代谢产物的分泌有助于……的拮抗活性。（注：原文中“of”后面缺少具体内容）

Curr Res Microb Sci. 2025 May 15;8:100402. doi: 10.1016/j.crmicr.2025.100402. eCollection 2025.

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS.使用DreaMS从数百万个串联质谱中进行分子表征的自监督学习。

Nat Biotechnol. 2025 May 23. doi: 10.1038/s41587-025-02663-3.

本文引用的文献

Urinary antihypertensive drug metabolite screening using molecular networking coupled to high-resolution mass spectrometry fragmentation.使用与高分辨率质谱碎片化联用的分子网络技术进行尿液中抗高血压药物代谢物筛查。

Metabolomics. 2016;12:125. doi: 10.1007/s11306-016-1064-z. Epub 2016 Jul 5.

A Data Structure for Rapid Mass Spectral Searching.一种用于快速质谱搜索的数据结构。

Mass Spectrom (Tokyo). 2014;3(Spec Iss 2):S0035. doi: 10.5702/massspectrometry.S0035. Epub 2014 Jul 15.

Updates in metabolomics tools and resources: 2014-2015.代谢组学工具与资源的更新：2014 - 2015年

Electrophoresis. 2016 Jan;37(1):86-110. doi: 10.1002/elps.201500417. Epub 2015 Nov 17.

Illuminating the dark matter in metabolomics.揭示代谢组学中的暗物质

Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12549-50. doi: 10.1073/pnas.1516878112. Epub 2015 Oct 1.

Searching molecular structure databases with tandem mass spectra using CSI:FingerID.使用CSI:FingerID通过串联质谱搜索分子结构数据库。

Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21.

MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis.MS-DIAL：用于全面代谢组分析的非数据依赖型串联质谱去卷积方法

Nat Methods. 2015 Jun;12(6):523-6. doi: 10.1038/nmeth.3393. Epub 2015 May 4.

Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures.复杂混合物非靶向代谢组学数据分析的质谱相似性

Int J Mass Spectrom. 2015 Feb 1;377:719-717. doi: 10.1016/j.ijms.2014.06.005.

MS2Analyzer: A software for small molecule substructure annotations from accurate tandem mass spectra.MS2Analyzer：一款用于从精确串联质谱中进行小分子亚结构注释的软件。

Anal Chem. 2014 Nov 4;86(21):10724-31. doi: 10.1021/ac502818e. Epub 2014 Oct 14.

In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine.尿液中绿茶代谢物的计算机模拟预测及液相色谱-质谱(n)自动注释

Anal Chem. 2014 May 20;86(10):4767-74. doi: 10.1021/ac403875b. Epub 2014 Apr 29.

A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.基因集分析方法在灵敏度、优先级和特异性方面的比较。

PLoS One. 2013 Nov 15;8(11):e79217. doi: 10.1371/journal.pone.0079217. eCollection 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验