暴露组学中血液标本非靶向化学分析的丰度和稀疏数据处理阈值以及错失的生物学见解

Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics.

机构信息

Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

出版信息

Front Public Health. 2021 Jun 10;9:653599. doi: 10.3389/fpubh.2021.653599. eCollection 2021.

DOI:10.3389/fpubh.2021.653599

PMID:34178917

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8222544/

Abstract

An untargeted chemical analysis of bio-fluids provides semi-quantitative data for thousands of chemicals for expanding our understanding about relationships among metabolic pathways, diseases, phenotypes and exposures. During the processing of mass spectral and chromatography data, various signal thresholds are used to control the number of peaks in the final data matrix that is used for statistical analyses. However, commonly used stringent thresholds generate constrained data matrices which may under-represent the detected chemical space, leading to missed biological insights in the exposome research. We have re-analyzed a liquid chromatography high resolution mass spectrometry data set for a publicly available epidemiology study ( = 499) of human cord blood samples using the MS-DIAL software with minimally possible thresholds during the data processing steps. Peak list for individual files and the data matrix after alignment and gap-filling steps were summarized for different peak height and detection frequency thresholds. Correlations between birth weight and LC/MS peaks in the newly generated data matrix were computed using the spearman correlation coefficient. MS-DIAL software detected on average 23,156 peaks for individual LC/MS file and 63,393 peaks in the aligned peak table. A combination of peak height and detection frequency thresholds that was used in the original publication at the individual file and the peak alignment levels can reject 90% peaks from the untargeted chemical analysis dataset that was generated by MS-DIAL. Correlation analysis for birth weight data suggested that up to 80% of the significantly associated peaks were rejected by the data processing thresholds that were used in the original publication. The re-analysis with minimum possible thresholds recovered metabolic insights about C19 steroids and hydroxy-acyl-carnitines and their relationships with birth weight. Data processing thresholds for peak height and detection frequencies at individual data file and at the alignment level should be used at minimal possible level or completely avoided for mining untargeted chemical analysis data in the exposome research for discovering new biomarkers and mechanisms.

摘要

生物体液的非靶向化学分析为代谢途径、疾病、表型和暴露之间的关系提供了数千种化学物质的半定量数据，从而扩展了我们的认识。在处理质谱和色谱数据时，会使用各种信号阈值来控制最终用于统计分析的数据矩阵中的峰数。然而，常用的严格阈值会生成受限的数据矩阵，从而可能无法充分代表检测到的化学空间，导致在暴露组学研究中错失生物学见解。我们使用 MS-DIAL 软件重新分析了一个公开的人类脐带血样本流行病学研究（n = 499）的液相色谱高分辨率质谱数据集，在数据处理步骤中使用了尽可能最小的阈值。对不同峰高和检测频率阈值下的单个文件的峰列表和对齐和填补步骤后的数据矩阵进行了总结。使用 Spearman 相关系数计算了新生成的数据矩阵中出生体重与 LC/MS 峰之间的相关性。MS-DIAL 软件平均为每个 LC/MS 文件检测到 23156 个峰，在对齐的峰表中检测到 63393 个峰。在原始出版物中在单个文件和峰对齐级别使用的峰高和检测频率阈值的组合可以将 MS-DIAL 生成的非靶向化学分析数据集的 90%的峰拒之门外。出生体重数据的相关分析表明，原始出版物中使用的数据处理阈值拒接了 80%的显著相关峰。使用尽可能最小的阈值进行重新分析恢复了与出生体重有关的 C19 类固醇和羟基酰基辅酶 A 的代谢见解。在暴露组学研究中，用于挖掘非靶向化学分析数据以发现新的生物标志物和机制时，应在尽可能低的水平或完全避免使用单个数据文件和对齐级别上的峰高和检测频率数据处理阈值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e12/8222544/e8673699b765/fpubh-09-653599-g0001.jpg

相似文献

Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics.暴露组学中血液标本非靶向化学分析的丰度和稀疏数据处理阈值以及错失的生物学见解

Front Public Health. 2021 Jun 10;9:653599. doi: 10.3389/fpubh.2021.653599. eCollection 2021.

Response: Commentary: Data processing thresholds for abundance and sparsity and missed biological insights in an untargeted chemical analysis of blood specimens for exposomics.回应：评论：血液样本非靶向化学分析中用于暴露组学的丰度和稀疏性数据处理阈值以及遗漏的生物学见解。

Front Public Health. 2022 Oct 18;10:1003148. doi: 10.3389/fpubh.2022.1003148. eCollection 2022.

Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography-Mass Spectroscopy (LC-MS) Data Processing.质谱特征列表优化器 (MS-FLO)：一种用于减少非靶向液相色谱-质谱 (LC-MS) 数据分析中假阳性峰报告的工具。

Anal Chem. 2017 Mar 21;89(6):3250-3255. doi: 10.1021/acs.analchem.6b04372. Epub 2017 Mar 6.

IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets.IDSL.IPA 描绘了非靶向 LC/HRMS 数据集的有机化学空间。

J Proteome Res. 2022 Jun 3;21(6):1485-1494. doi: 10.1021/acs.jproteome.2c00120. Epub 2022 May 17.

MARS: A Multipurpose Software for Untargeted LC-MS-Based Metabolomics and Exposomics.MARS：一种基于非靶向 LC-MS 的代谢组学和暴露组学的多功能软件。

Anal Chem. 2024 Jan 30;96(4):1468-1477. doi: 10.1021/acs.analchem.3c03620. Epub 2024 Jan 18.

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.MetaClean：一种基于机器学习的分类器，用于降低非靶向 LC-MS 代谢组学数据中假阳性峰的检测率。

Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets.IDSL.CSA：用于非靶向代谢组学数据集化学注释的复合谱分析。

Anal Chem. 2023 Jun 27;95(25):9480-9487. doi: 10.1021/acs.analchem.3c00376. Epub 2023 Jun 13.

IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics.IDSL.UFA 为代谢组学和暴露组学中的非靶向 LC/HRMS 数据集分配高可信度分子公式注释。

Anal Chem. 2022 Oct 4;94(39):13315-13322. doi: 10.1021/acs.analchem.2c00563. Epub 2022 Sep 22.

Data Processing for GC-MS- and LC-MS-Based Untargeted Metabolomics.基于气相色谱-质谱联用和液相色谱-质谱联用的非靶向代谢组学的数据处理

Methods Mol Biol. 2019;1978:287-299. doi: 10.1007/978-1-4939-9236-2_18.

引用本文的文献

Multiparametric Optimization of Data-Dependent Acquisition Towards More Holistic Bacterial Metabolite Coverage Through Molecular Networking.通过分子网络对数据依赖采集进行多参数优化以实现更全面的细菌代谢物覆盖

Int J Microbiol. 2025 Jul 21;2025:4388417. doi: 10.1155/ijm/4388417. eCollection 2025.

Multilaboratory Untargeted Mass Spectrometry Metabolomics Collaboration to Identify Bottlenecks and Comprehensively Annotate A Single Dataset.多实验室非靶向质谱代谢组学协作以识别瓶颈并全面注释单个数据集

Anal Chem. 2025 Aug 5;97(30):16110-16122. doi: 10.1021/acs.analchem.4c05577. Epub 2025 Jul 22.

Comprehensive Blood Metabolome and Exposome Analysis, Annotation, and Interpretation in E-Waste Workers.电子垃圾处理工人的综合血液代谢组和暴露组分析、注释及解读

Metabolites. 2024 Dec 2;14(12):671. doi: 10.3390/metabo14120671.

Picky with peakpicking: assessing chromatographic peak quality with simple metrics in metabolomics.峰选择挑剔：用代谢组学中简单的指标评估色谱峰质量。

BMC Bioinformatics. 2023 Oct 28;24(1):404. doi: 10.1186/s12859-023-05533-4.

Front Public Health. 2022 Oct 18;10:1003148. doi: 10.3389/fpubh.2022.1003148. eCollection 2022.

AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications.人工智能/机器学习驱动的非靶向代谢组学和暴露组学在生物医学应用中的进展。

Cell Rep Phys Sci. 2022 Jul 20;3(7). doi: 10.1016/j.xcrp.2022.100978.

IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets.IDSL.IPA 描绘了非靶向 LC/HRMS 数据集的有机化学空间。

J Proteome Res. 2022 Jun 3;21(6):1485-1494. doi: 10.1021/acs.jproteome.2c00120. Epub 2022 May 17.

CCDB: A database for exploring inter-chemical correlations in metabolomics and exposomics datasets.CCDB：用于探索代谢组学和暴露组学数据集中化学物质相互关联的数据库。

Environ Int. 2022 Jun;164:107240. doi: 10.1016/j.envint.2022.107240. Epub 2022 Apr 18.

Commentary: Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics.评论：血液样本非靶向化学暴露组学分析中丰度与稀疏性的数据处理阈值及遗漏的生物学见解

Front Public Health. 2022 Jan 17;9:755837. doi: 10.3389/fpubh.2021.755837. eCollection 2021.

本文引用的文献

From Metabolomics to HRMS-Based Exposomics: Adapting Peak Picking and Developing Scoring for MS1 Suspect Screening.从代谢组学到基于 HRMS 的暴露组学：适应 MS1 候选筛查中的峰提取和开发评分。

Anal Chem. 2021 Jan 26;93(3):1792-1800. doi: 10.1021/acs.analchem.0c04660. Epub 2020 Dec 22.

Environmental exposure to pyrethroid pesticides in a nationally representative sample of U.S. adults and children: The National Health and Nutrition Examination Survey 2007-2012.在美国成年人和儿童的全国代表性样本中环境接触拟除虫菊酯类农药：2007 - 2012年国家健康与营养检查调查

Environ Pollut. 2020 Dec;267:115489. doi: 10.1016/j.envpol.2020.115489. Epub 2020 Aug 29.

Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics-Standardization, Coverage, and Throughput.基于质谱的代谢组学和脂质组学中的常见主题——标准化、覆盖范围和通量

Anal Chem. 2021 Jan 12;93(1):519-545. doi: 10.1021/acs.analchem.0c04698. Epub 2020 Nov 28.

Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.采用高分辨碎裂质谱对未知代谢物进行系统分类。

Nat Biotechnol. 2021 Apr;39(4):462-471. doi: 10.1038/s41587-020-0740-8. Epub 2020 Nov 23.

RefMet: a reference nomenclature for metabolomics.RefMet：代谢组学的参考命名法。

Nat Methods. 2020 Dec;17(12):1173-1174. doi: 10.1038/s41592-020-01009-y.

Metabolic Signatures of the Exposome-Quantifying the Impact of Exposure to Environmental Chemicals on Human Health.暴露组的代谢特征——量化环境化学物质暴露对人类健康的影响

Metabolites. 2020 Nov 10;10(11):454. doi: 10.3390/metabo10110454.

A reference map of potential determinants for the human serum metabolome.人类血清代谢组潜在决定因素参考图谱。

Nature. 2020 Dec;588(7836):135-140. doi: 10.1038/s41586-020-2896-2. Epub 2020 Nov 11.

Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3.

Human biomonitoring initiative (HBM4EU) - Strategy to derive human biomonitoring guidance values (HBM-GVs) for health risk assessment.人类生物监测计划（HBM4EU）——推导用于健康风险评估的人类生物监测指导值（HBM-GVs）的策略。

Int J Hyg Environ Health. 2020 Sep;230:113622. doi: 10.1016/j.ijheh.2020.113622. Epub 2020 Oct 9.

Characterization of the Human Exposome by a Comprehensive and Quantitative Large-Scale Multianalyte Metabolomics Platform.人类暴露组的特征分析：基于全面、定量的大规模多代谢组学分析平台。

Anal Chem. 2020 Oct 20;92(20):13767-13775. doi: 10.1021/acs.analchem.0c02008. Epub 2020 Sep 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

暴露组学中血液标本非靶向化学分析的丰度和稀疏数据处理阈值以及错失的生物学见解

Data Processing Thresholds for Abundance and Sparsity and Missed Biological Insights in an Untargeted Chemical Analysis of Blood Specimens for Exposomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献