• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MetaClean:一种基于机器学习的分类器,用于降低非靶向 LC-MS 代谢组学数据中假阳性峰的检测率。

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

机构信息

Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3.

DOI:10.1007/s11306-020-01738-3
PMID:33085002
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7895495/
Abstract

INTRODUCTION

Despite the availability of several pre-processing software, poor peak integration remains a prevalent problem in untargeted metabolomics data generated using liquid chromatography high-resolution mass spectrometry (LC-MS). As a result, the output of these pre-processing software may retain incorrectly calculated metabolite abundances that can perpetuate in downstream analyses.

OBJECTIVES

To address this problem, we propose a computational methodology that combines machine learning and peak quality metrics to filter out low quality peaks.

METHODS

Specifically, we comprehensively and systematically compared the performance of 24 different classifiers generated by combining eight classification algorithms and three sets of peak quality metrics on the task of distinguishing reliably integrated peaks from poorly integrated ones. These classifiers were compared to using a residual standard deviation (RSD) cut-off in pooled quality-control (QC) samples, which aims to remove peaks with analytical error.

RESULTS

The best performing classifier was found to be a combination of the AdaBoost algorithm and a set of 11 peak quality metrics previously explored in untargeted metabolomics and proteomics studies. As a complementary approach, applying our framework to peaks retained after filtering by 30% RSD across pooled QC samples was able to further distinguish poorly integrated peaks that were not removed from filtering alone. An R implementation of these classifiers and the overall computational approach is available as the MetaClean package at https://CRAN.R-project.org/package=MetaClean .

CONCLUSION

Our work represents an important step forward in developing an automated tool for filtering out unreliable peak integrations in untargeted LC-MS metabolomics data.

摘要

简介

尽管有多种预处理软件可供使用,但在使用液相色谱高分辨率质谱(LC-MS)生成的非靶向代谢组学数据中,峰积分仍然是一个普遍存在的问题。因此,这些预处理软件的输出可能会保留不正确计算的代谢物丰度,这些丰度可能会在下游分析中持续存在。

目的

为了解决这个问题,我们提出了一种结合机器学习和峰质量指标的计算方法,以过滤出低质量的峰。

方法

具体来说,我们全面系统地比较了 24 种不同分类器的性能,这些分类器是通过将八种分类算法和三套峰质量指标结合起来,用于区分可靠积分峰和积分不良峰。这些分类器与使用综合质量控制(QC)样本中的剩余标准差(RSD)截止值进行比较,其目的是去除具有分析误差的峰。

结果

发现性能最佳的分类器是一种组合,结合了 AdaBoost 算法和一套 11 个峰质量指标,这些指标之前在非靶向代谢组学和蛋白质组学研究中进行了探索。作为一种补充方法,将我们的框架应用于通过 30%RSD 过滤后保留的峰,能够进一步区分那些仅通过过滤无法去除的积分不良峰。这些分类器和整体计算方法的 R 实现可作为 MetaClean 软件包在 https://CRAN.R-project.org/package=MetaClean 上获得。

结论

我们的工作在开发用于过滤非靶向 LC-MS 代谢组学数据中不可靠峰积分的自动化工具方面迈出了重要的一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/ee82bc42841c/nihms-1668282-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/412ebeca2a8c/nihms-1668282-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/1032aefa939d/nihms-1668282-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/f4d7fa431518/nihms-1668282-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/ee82bc42841c/nihms-1668282-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/412ebeca2a8c/nihms-1668282-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/1032aefa939d/nihms-1668282-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/f4d7fa431518/nihms-1668282-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/7895495/ee82bc42841c/nihms-1668282-f0004.jpg

相似文献

1
MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.MetaClean:一种基于机器学习的分类器,用于降低非靶向 LC-MS 代谢组学数据中假阳性峰的检测率。
Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3.
2
Automated optimization of XCMS parameters for improved peak picking of liquid chromatography-mass spectrometry data using the coefficient of variation and parameter sweeping for untargeted metabolomics.使用变异系数和参数扫描对液相色谱-质谱联用数据进行无靶标代谢组学分析时,自动优化 XCMS 参数以提高峰提取效率。
Drug Test Anal. 2019 Jun;11(6):752-761. doi: 10.1002/dta.2552. Epub 2018 Dec 25.
3
Comparison of peak-picking workflows for untargeted liquid chromatography/high-resolution mass spectrometry metabolomics data analysis.非靶向液相色谱/高分辨率质谱代谢组学数据分析中峰挑选工作流程的比较
Rapid Commun Mass Spectrom. 2015 Jan 15;29(1):119-27. doi: 10.1002/rcm.7094.
4
Comprehensive Peak Characterization (CPC) in Untargeted LC-MS Analysis.非靶向液相色谱-质谱分析中的综合峰表征(CPC)
Metabolites. 2022 Feb 2;12(2):137. doi: 10.3390/metabo12020137.
5
Assessment of XCMS Optimization Methods with Machine-Learning Performance.基于机器学习性能的 XCMS 优化方法评估。
Anal Chem. 2021 Oct 12;93(40):13459-13466. doi: 10.1021/acs.analchem.1c02000. Epub 2021 Sep 29.
6
IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets.IDSL.IPA 描绘了非靶向 LC/HRMS 数据集的有机化学空间。
J Proteome Res. 2022 Jun 3;21(6):1485-1494. doi: 10.1021/acs.jproteome.2c00120. Epub 2022 May 17.
7
PeakDetective: A Semisupervised Deep Learning-Based Approach for Peak Curation in Untargeted Metabolomics.PeakDetective:一种基于半监督深度学习的无靶代谢组学峰提取方法。
Anal Chem. 2023 Jun 27;95(25):9397-9403. doi: 10.1021/acs.analchem.3c00764. Epub 2023 Jun 14.
8
Current Practices in LC-MS Untargeted Metabolomics: A Scoping Review on the Use of Pooled Quality Control Samples.LC-MS 非靶向代谢组学的当前实践:关于使用混合质量控制样品的范围综述。
Anal Chem. 2023 Dec 26;95(51):18645-18654. doi: 10.1021/acs.analchem.3c02924. Epub 2023 Dec 6.
9
Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics.提示:用于无靶标代谢组学中化合物注释的保留时间预测。
Anal Chem. 2020 Jun 2;92(11):7515-7522. doi: 10.1021/acs.analchem.9b05765. Epub 2020 May 21.
10
Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography-Mass Spectroscopy (LC-MS) Data Processing.质谱特征列表优化器 (MS-FLO):一种用于减少非靶向液相色谱-质谱 (LC-MS) 数据分析中假阳性峰报告的工具。
Anal Chem. 2017 Mar 21;89(6):3250-3255. doi: 10.1021/acs.analchem.6b04372. Epub 2017 Mar 6.

引用本文的文献

1
Techniques, Databases and Software Used for Studying Polar Metabolites and Lipids of Gastrointestinal Parasites.用于研究胃肠道寄生虫极性代谢物和脂质的技术、数据库及软件
Animals (Basel). 2024 Sep 13;14(18):2671. doi: 10.3390/ani14182671.
2
Identification of Plasma Metabolomic Biomarkers of Juvenile Idiopathic Arthritis.青少年特发性关节炎血浆代谢组学生物标志物的鉴定
Metabolites. 2024 Sep 16;14(9):499. doi: 10.3390/metabo14090499.
3
Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data.

本文引用的文献

1
Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis.使用MetaboAnalyst 4.0进行全面综合的代谢组学数据分析。
Curr Protoc Bioinformatics. 2019 Dec;68(1):e86. doi: 10.1002/cpbi.86.
2
MetaboLights: a resource evolving in response to the needs of its scientific community.代谢组学文献共享资源库(MetaboLights):一个响应其科研群体需求而不断发展的资源库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D440-D444. doi: 10.1093/nar/gkz1019.
3
WiPP: Workflow for Improved Peak Picking for Gas Chromatography-Mass Spectrometry (GC-MS) Data.
基于特征的非靶向代谢组学数据分子网络结果的统计分析
Nat Protoc. 2025 Jan;20(1):92-162. doi: 10.1038/s41596-024-01046-3. Epub 2024 Sep 20.
4
Deconvoluting low yield from weak potency in direct-to-biology workflows with machine learning.利用机器学习在直接进入生物学的工作流程中区分低产量与低效性。
RSC Med Chem. 2024 Feb 15;15(3):1015-1021. doi: 10.1039/d3md00719g. eCollection 2024 Mar 20.
5
Microbiome metabolite quantification methods enabling insights into human health and disease.微生物组代谢产物定量方法可深入了解人类健康和疾病。
Methods. 2024 Feb;222:81-99. doi: 10.1016/j.ymeth.2023.12.007. Epub 2024 Jan 5.
6
PeakDetective: A Semisupervised Deep Learning-Based Approach for Peak Curation in Untargeted Metabolomics.PeakDetective:一种基于半监督深度学习的无靶代谢组学峰提取方法。
Anal Chem. 2023 Jun 27;95(25):9397-9403. doi: 10.1021/acs.analchem.3c00764. Epub 2023 Jun 14.
7
The use of predictive models to develop chromatography-based purification processes.使用预测模型来开发基于色谱的纯化工艺。
Front Bioeng Biotechnol. 2022 Oct 12;10:1009102. doi: 10.3389/fbioe.2022.1009102. eCollection 2022.
8
Study on plasma metabolomics for HIV/AIDS patients treated by HAART based on LC/MS-MS.基于液相色谱-串联质谱法的高效抗逆转录病毒治疗(HAART)的HIV/AIDS患者血浆代谢组学研究
Front Pharmacol. 2022 Aug 29;13:885386. doi: 10.3389/fphar.2022.885386. eCollection 2022.
9
AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications.人工智能/机器学习驱动的非靶向代谢组学和暴露组学在生物医学应用中的进展。
Cell Rep Phys Sci. 2022 Jul 20;3(7). doi: 10.1016/j.xcrp.2022.100978.
10
IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets.IDSL.IPA 描绘了非靶向 LC/HRMS 数据集的有机化学空间。
J Proteome Res. 2022 Jun 3;21(6):1485-1494. doi: 10.1021/acs.jproteome.2c00120. Epub 2022 May 17.
WiPP:用于改进气相色谱-质谱联用(GC-MS)数据峰提取的工作流程
Metabolites. 2019 Aug 21;9(9):171. doi: 10.3390/metabo9090171.
4
Filtering procedures for untargeted LC-MS metabolomics data.非靶向 LC-MS 代谢组学数据的过滤程序。
BMC Bioinformatics. 2019 Jun 14;20(1):334. doi: 10.1186/s12859-019-2871-9.
5
Quality assessment and interference detection in targeted mass spectrometry data using machine learning.使用机器学习对靶向质谱数据进行质量评估和干扰检测。
Clin Proteomics. 2018 Oct 6;15:33. doi: 10.1186/s12014-018-9209-x. eCollection 2018.
6
Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies.非靶向临床代谢组学研究中质谱分析中系统适用性和质量控制样品使用的指南与注意事项。
Metabolomics. 2018;14(6):72. doi: 10.1007/s11306-018-1367-3. Epub 2018 May 18.
7
Detailed Investigation and Comparison of the XCMS and MZmine 2 Chromatogram Construction and Chromatographic Peak Detection Methods for Preprocessing Mass Spectrometry Metabolomics Data.用于质谱代谢组学数据预处理的XCMS和MZmine 2色谱图构建及色谱峰检测方法的详细研究与比较
Anal Chem. 2017 Sep 5;89(17):8689-8695. doi: 10.1021/acs.analchem.7b01069. Epub 2017 Aug 17.
8
Lipid metabolites as potential diagnostic and prognostic biomarkers for acute community acquired pneumonia.脂质代谢产物作为急性社区获得性肺炎潜在的诊断和预后生物标志物
Diagn Microbiol Infect Dis. 2016 Jun;85(2):249-54. doi: 10.1016/j.diagmicrobio.2016.03.012. Epub 2016 Mar 14.
9
Metabolomic Profiling of Submaximal Exercise at a Standardised Relative Intensity in Healthy Adults.健康成年人在标准化相对强度下进行次最大运动的代谢组学分析。
Metabolites. 2016 Feb 26;6(1):9. doi: 10.3390/metabo6010009.
10
Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools.代谢组学工作台:一个用于代谢组学数据与元数据、代谢物标准品、实验方案、教程与培训以及分析工具的国际储存库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D463-70. doi: 10.1093/nar/gkv1042. Epub 2015 Oct 13.