compMS2Miner：一个用于高分辨 LC-MS 数据集的自动化代谢物鉴定、可视化和数据共享 R 包。

compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.

机构信息

Rappaport Lab, UC Berkeley, School of Public Health , GL81 Koshland Hall, Berkeley, California 94720, United States.

Metabolomics FiehnLab, NIH West-Coast Metabolomics Center (WCMC), University of California Davis , Davis, California 95616 United States.

出版信息

Anal Chem. 2017 Apr 4;89(7):3919-3928. doi: 10.1021/acs.analchem.6b02394. Epub 2017 Mar 27.

DOI:10.1021/acs.analchem.6b02394

PMID:28225587

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6338221/

Abstract

A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMSMiner (Comprehensive MS Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS data files as inputs. The number of MS spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMSMiner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS data with a Web application GUI compMSExplorer (Comprehensive MS Explorer) that also facilitates data-sharing and transparency. The automatable compMSMiner workflow consists of the following steps: (i) matching unknown MS features to precursor MS scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMSExplorer application. Metabolite identities and comments can also be recorded using an interactive table within compMSExplorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMSMiner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMSExplorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).

摘要

长期以来，超高效液相色谱-高分辨质谱（UHPLC-HRMS）非靶向代谢组学分析的一个挑战是如何将未知的质谱特征有效地转换为可靠的代谢物注释。Comprehensive MS Miner (CompMSMiner) 包是用 R 语言开发的，用于在输入峰提取器输出和 MS 数据文件的情况下，快速、全面地进行特征注释。在代谢组学分析实验中，MS 谱的数量远远超过手动解释所需的时间；因此，需要一定程度的软件工作流程自动化来进行广泛的代谢物注释。CompMSMiner 将许多有用的工具集成到一个工作流程中，用于代谢物注释，同时还提供了一个使用 Web 应用程序 GUI compMSExplorer （Comprehensive MS Explorer）的 MS 数据概览方式，该应用程序还便于数据共享和透明度。自动化的 CompMSMiner 工作流程包括以下步骤：（i）将未知的 MS 特征与前体 MS 扫描进行匹配，（ii）过滤光谱噪声（动态噪声滤波器），（iii）通过多个相似谱信号的总和和冗余/污染物谱的去除生成复合质谱，（iv）使用内部数据库解释可能的碎片离子亚结构，（v）使用化学和光谱数据库对未知物进行注释，并预测哺乳动物生物转化代谢物，用于计算的片段化软件的包装函数，基于最近邻化学相似性评分的保留时间预测，基于随机森林的保留时间预测，基于文本挖掘的假阳性去除/真阳性排序，化学分类预测和基于差异进化的全局注释评分优化，以及（vi）通过 compMSExplorer 应用程序实现网络图形可视化、数据管理和共享。也可以使用 compMSExplorer 中的交互式表格记录代谢物的身份和注释。本文通过一组来自 7 只饮食诱导肥胖（DIO）和 7 只非肥胖（NO）C57BL/6J 小鼠的血清样本数据，以及一组用抗生素（链霉素）处理以敲低肠道微生物群的血清样本数据，说明了该软件包的实用性。工作流程中自动注释的所有 MS 谱都在支持信息中提供，也可以作为公共的 compMSExplorer 应用程序（分别用于正模式和负模式）进行探索（https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS 和 https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG）。该工作流程提供了对受饮食和抗生素处理影响的多种内源性和肠道微生物衍生代谢物的快速注释，与之前发表的报告一致。（n = 173）的复合谱自动匹配到北美质谱库（MoNA）光谱库中的条目。这些实验和虚拟（脂质 Blast）谱对应于 29 种常见的内源性化合物类（例如，51 种溶血磷脂酰胆碱谱），然后用于计算 7 种单个评分指标的排名能力。结果发现，尽管自动化可能会出现假阳性注释的风险，但 7 种单个评分指标的平均值提供了对 MoNA 匹配谱的最有效的加权平均排名能力 3。在几个情况下，发现相对碳-碳双键位置等细微结构差异会影响 MoNA 注释代谢物的正确排名。最新版本和示例工作流程可在包说明（https://github.com/WMBEdmands/compMS2Miner）中获得，已发布应用程序的版本可在 shinyapps.io 网站上获得（https://wmbedmands.shinyapps.io/compMS2Example）。

相似文献

compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.compMS2Miner：一个用于高分辨 LC-MS 数据集的自动化代谢物鉴定、可视化和数据共享 R 包。

Anal Chem. 2017 Apr 4;89(7):3919-3928. doi: 10.1021/acs.analchem.6b02394. Epub 2017 Mar 27.

MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC-MS metabolomic datasets.MetMSLine：用于快速处理高分辨率液相色谱-质谱代谢组学数据集的自动化且完全集成的流程。

Bioinformatics. 2015 Mar 1;31(5):788-90. doi: 10.1093/bioinformatics/btu705. Epub 2014 Oct 27.

[A novel method for efficient screening and annotation of important pathway-associated metabolites based on the modified metabolome and probe molecules].一种基于改良代谢组和探针分子的重要通路相关代谢物高效筛选与注释新方法

Se Pu. 2022 Sep;40(9):788-796. doi: 10.3724/SP.J.1123.2022.03025.

MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry.MAW：用于非靶向串联质谱的可重复代谢组注释工作流程

J Cheminform. 2023 Mar 4;15(1):32. doi: 10.1186/s13321-023-00695-y.

Metabolomic spectral libraries for data-independent SWATH liquid chromatography mass spectrometry acquisition.用于数据非依赖型SWATH液相色谱质谱采集的代谢组学光谱库。

Anal Bioanal Chem. 2018 Mar;410(7):1873-1884. doi: 10.1007/s00216-018-0860-x. Epub 2018 Feb 6.

Automated Annotation of Untargeted All-Ion Fragmentation LC-MS Metabolomics Data with MetaboAnnotatoR.使用 MetaboAnnotatoR 对非靶向全离子碎裂 LC-MS 代谢组学数据进行自动注释。

Anal Chem. 2022 Mar 1;94(8):3446-3455. doi: 10.1021/acs.analchem.1c03032. Epub 2022 Feb 18.

peakPantheR, an R package for large-scale targeted extraction and integration of annotated metabolic features in LC-MS profiling datasets.peakPantheR，一个用于大规模靶向提取和整合 LC-MS 分析数据集注释代谢特征的 R 包。

Bioinformatics. 2021 Dec 11;37(24):4886-4888. doi: 10.1093/bioinformatics/btab433.

IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets.IDSL.CSA：用于非靶向代谢组学数据集化学注释的复合光谱分析

bioRxiv. 2023 May 31:2023.02.09.527886. doi: 10.1101/2023.02.09.527886.

metID: an R package for automatable compound annotation for LC-MS-based data.metID：一个用于基于 LC-MS 数据的自动化合物注释的 R 包。

Bioinformatics. 2022 Jan 3;38(2):568-569. doi: 10.1093/bioinformatics/btab583.

Quality evaluation of metabolite annotation based on comprehensive simulation of MS/MS data from high-resolution mass spectrometry (HRMS) and similarity scoring.基于高分辨率质谱（HRMS）的MS/MS数据综合模拟和相似性评分的代谢物注释质量评估

Anal Bioanal Chem. 2025 Jun;417(14):3061-3077. doi: 10.1007/s00216-025-05847-7. Epub 2025 Apr 18.

引用本文的文献

Integrated annotation prioritizes metabolites with bioactivity in inflammatory bowel disease.综合注释优先考虑具有生物活性的代谢物在炎症性肠病中的作用。

Mol Syst Biol. 2024 Apr;20(4):338-361. doi: 10.1038/s44320-024-00027-8. Epub 2024 Mar 11.

Untargeted In Silico Compound Classification-A Novel Metabolomics Method to Assess the Chemodiversity in Bryophytes.无靶向计算机化合物分类——一种评估苔藓植物化学多样性的新代谢组学方法。

Int J Mol Sci. 2021 Mar 23;22(6):3251. doi: 10.3390/ijms22063251.

The metaRbolomics Toolbox in Bioconductor and beyond.生物导体及其他领域中的代谢组学工具箱。

Metabolites. 2019 Sep 23;9(10):200. doi: 10.3390/metabo9100200.

MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools.MolNetEnhancer：通过整合代谢组挖掘与注释工具增强分子网络

Metabolites. 2019 Jul 16;9(7):144. doi: 10.3390/metabo9070144.

Metabolomics of neonatal blood spots reveal distinct phenotypes of pediatric acute lymphoblastic leukemia and potential effects of early-life nutrition.新生儿血斑代谢组学揭示儿科急性淋巴细胞白血病的不同表型及早期生活营养的潜在影响。

Cancer Lett. 2019 Jun 28;452:71-78. doi: 10.1016/j.canlet.2019.03.007. Epub 2019 Mar 20.

CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network.CliqueMS：一种基于共流出相似性网络的用于从 LC-MS 非靶向代谢组学数据中注释源内代谢物离子的计算工具。

Bioinformatics. 2019 Oct 15;35(20):4089-4097. doi: 10.1093/bioinformatics/btz207.

Untargeted lipidomic features associated with colorectal cancer in a prospective cohort.与前瞻性队列中的结直肠癌相关的非靶向脂质组学特征。

BMC Cancer. 2018 Oct 19;18(1):996. doi: 10.1186/s12885-018-4894-4.

Redefining environmental exposure for disease etiology.重新定义疾病病因中的环境暴露。

NPJ Syst Biol Appl. 2018 Sep 1;4:30. doi: 10.1038/s41540-018-0065-0. eCollection 2018.

A Suspect Screening Method for Characterizing Multiple Chemical Exposures among a Demographically Diverse Population of Pregnant Women in San Francisco.一种用于对旧金山不同人口统计学特征孕妇群体的多种化学暴露进行特征描述的可疑筛查方法。

Environ Health Perspect. 2018 Jul 24;126(7):077009. doi: 10.1289/EHP2920. eCollection 2018 Jul.

Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA.将非靶向分析研究和化学安全评估工具整合到美国环保局中。

J Expo Sci Environ Epidemiol. 2018 Sep;28(5):411-426. doi: 10.1038/s41370-017-0012-y. Epub 2017 Dec 29.

本文引用的文献

Dietary fat and gut microbiota interactions determine diet-induced obesity in mice.膳食脂肪和肠道微生物群的相互作用决定了小鼠的饮食诱导肥胖。

Mol Metab. 2016 Oct 13;5(12):1162-1174. doi: 10.1016/j.molmet.2016.10.001. eCollection 2016 Dec.

Topic modeling for untargeted substructure exploration in metabolomics.代谢组学中用于非靶向子结构探索的主题建模

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):13738-13743. doi: 10.1073/pnas.1608041113. Epub 2016 Nov 16.

SPLASH, a hashed identifier for mass spectra.SPLASH，一种用于质谱的哈希标识符。

Nat Biotechnol. 2016 Nov 8;34(11):1099-1101. doi: 10.1038/nbt.3689.

Cognitive impairment by antibiotic-induced gut dysbiosis: Analysis of gut microbiota-brain communication.抗生素诱导的肠道菌群失调所致认知障碍：肠道微生物群与大脑的通讯分析

Brain Behav Immun. 2016 Aug;56:140-55. doi: 10.1016/j.bbi.2016.02.020. Epub 2016 Feb 23.

Intestinal Microbiota Distinguish Gout Patients from Healthy Humans.肠道微生物群可区分痛风患者与健康人。

Sci Rep. 2016 Feb 8;6:20602. doi: 10.1038/srep20602.

MetFrag relaunched: incorporating strategies beyond in silico fragmentation.MetFrag重新推出：纳入计算机辅助碎片化之外的策略。

J Cheminform. 2016 Jan 29;8:3. doi: 10.1186/s13321-016-0115-9. eCollection 2016.

MyCompoundID MS/MS Search: Metabolite Identification Using a Library of Predicted Fragment-Ion-Spectra of 383,830 Possible Human Metabolites.MyCompoundID质谱/质谱搜索：使用383,830种可能的人类代谢物的预测碎片离子光谱库进行代谢物鉴定。

Anal Chem. 2015 Oct 20;87(20):10619-26. doi: 10.1021/acs.analchem.5b03126. Epub 2015 Oct 6.

Diet- and Genetically-Induced Obesity Differentially Affect the Fecal Microbiome and Metabolome in Apc1638N Mice.饮食和基因诱导的肥胖对Apc1638N小鼠的粪便微生物组和代谢组有不同影响。

PLoS One. 2015 Aug 18;10(8):e0135758. doi: 10.1371/journal.pone.0135758. eCollection 2015.

Multiplexed peptide analysis using data-independent acquisition and Skyline.使用数据非依赖采集和Skyline进行多重肽分析。

Nat Protoc. 2015 Jun;10(6):887-903. doi: 10.1038/nprot.2015.055. Epub 2015 May 21.

Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.亲水作用液相色谱-质谱联用中保留时间的预测及其在代谢组学峰注释中的应用

Metabolomics. 2015;11(3):696-706. doi: 10.1007/s11306-014-0727-x. Epub 2014 Sep 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验