在数据预处理中解决 LC/MS 代谢组学数据的批次效应问题。

Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing.

机构信息

School of Software Engineering, Tongji University, Shanghai, 201804, China.

Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.

出版信息

Sci Rep. 2020 Aug 17;10(1):13856. doi: 10.1038/s41598-020-70850-0.

DOI:10.1038/s41598-020-70850-0

PMID:32807888

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7431853/

Abstract

With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography-Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ .

摘要

随着代谢组学研究的发展，越来越多的研究针对大量样本进行。由于液相色谱-质谱（LC/MS）平台的技术限制，样本通常需要分多个批次处理。在不同批次之间，我们经常观察到数据特征的差异。在这项工作中，我们特别关注在同一 LC/MS 仪器上生成的多个批次的数据。传统的预处理方法将所有样本视为一个单一的组。这种做法可能导致峰对齐错误，而无法通过事后应用批次效应校正方法进行校正。在这项工作中，我们开发了一种新的方法，在预处理阶段解决批次效应问题，从而实现更好的峰检测、对齐和定量。它可以与下游批次效应校正方法结合使用，以进一步校正批次间的强度差异。该方法在现有的 apLCMS 平台工作流程中实现。通过对来自标准化质量控制（QC）血浆样本和真实生物研究的多个批次数据进行分析，新方法产生的特征表具有更好的一致性，以及更好的下游分析结果。该方法可以为涉及多个批次的大型研究提供有用的工具补充。该方法作为 apLCMS 软件包的一部分提供。下载链接和说明在 https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a476/7431853/89631876696c/41598_2020_70850_Fig1_HTML.jpg

相似文献

Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing.

Sci Rep. 2020 Aug 17;10(1):13856. doi: 10.1038/s41598-020-70850-0.

Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction.

Metabolomics. 2016;12(11):173. doi: 10.1007/s11306-016-1124-4. Epub 2016 Sep 22.

Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.

Bioinformatics. 2014 Oct 15;30(20):2941-8. doi: 10.1093/bioinformatics/btu430. Epub 2014 Jul 7.

apLCMS--adaptive processing of high-resolution LC/MS data.

Bioinformatics. 2009 Aug 1;25(15):1930-6. doi: 10.1093/bioinformatics/btp291. Epub 2009 May 4.

Batch alignment via retention orders for preprocessing large-scale multi-batch LC-MS experiments.

Bioinformatics. 2022 Aug 2;38(15):3759-3767. doi: 10.1093/bioinformatics/btac407.

Hybrid feature detection and information accumulation using high-resolution LC-MS metabolomics data.

J Proteome Res. 2013 Mar 1;12(3):1419-27. doi: 10.1021/pr301053d. Epub 2013 Feb 12.

xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data.

BMC Bioinformatics. 2013 Jan 16;14:15. doi: 10.1186/1471-2105-14-15.

Detection of batch effects in liquid chromatography-mass spectrometry metabolomic data using guided principal component analysis.

Talanta. 2014 Dec;130:442-8. doi: 10.1016/j.talanta.2014.07.031. Epub 2014 Jul 18.

Development and Application of Ultra-Performance Liquid Chromatography-TOF MS for Precision Large Scale Urinary Metabolic Phenotyping.

Anal Chem. 2016 Sep 20;88(18):9004-13. doi: 10.1021/acs.analchem.6b01481. Epub 2016 Aug 26.

Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data.

J Chromatogr A. 2017 Nov 10;1523:265-274. doi: 10.1016/j.chroma.2017.09.023. Epub 2017 Sep 9.

引用本文的文献

High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT).

Nat Commun. 2025 Aug 2;16(1):7104. doi: 10.1038/s41467-025-62237-4.

mzQuality: An Open-Source Software Tool for Quality Monitoring and Reporting of Targeted Mass Spectrometry Measurements.

J Am Soc Mass Spectrom. 2025 Aug 6;36(8):1669-1676. doi: 10.1021/jasms.5c00073. Epub 2025 Jul 25.

The impact of spectral data pre-processing on the assessment of red wine vintage through spectroscopic methods.

J Sci Food Agric. 2025 Aug 30;105(11):5986-5998. doi: 10.1002/jsfa.14351. Epub 2025 May 12.

Cerebrospinal fluid metabolomics in autistic regression reveals dysregulation of sphingolipids and decreased β-hydroxybutyrate.

EBioMedicine. 2025 Apr;114:105664. doi: 10.1016/j.ebiom.2025.105664. Epub 2025 Mar 25.

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data.

Nat Protoc. 2025 Jan;20(1):92-162. doi: 10.1038/s41596-024-01046-3. Epub 2024 Sep 20.

Optimal transport for automatic alignment of untargeted metabolomic data.

Elife. 2024 Jun 18;12:RP91597. doi: 10.7554/eLife.91597.

metabCombiner 2.0: Disparate Multi-Dataset Feature Alignment for LC-MS Metabolomics.

Metabolites. 2024 Feb 15;14(2):125. doi: 10.3390/metabo14020125.

Mass spectrometry in cerebrospinal fluid uncovers association of glycolysis biomarkers with Alzheimer's disease in a large clinical sample.

Sci Rep. 2023 Dec 16;13(1):22406. doi: 10.1038/s41598-023-49440-3.

Metabolic pathways altered by air pollutant exposure in association with lipid profiles in young adults.

Environ Pollut. 2023 Jun 15;327:121522. doi: 10.1016/j.envpol.2023.121522. Epub 2023 Apr 3.

Comprehensive Lipidomic Workflow for Multicohort Population Phenotyping Using Stable Isotope Dilution Targeted Liquid Chromatography-Mass Spectrometry.

J Proteome Res. 2023 May 5;22(5):1419-1433. doi: 10.1021/acs.jproteome.2c00682. Epub 2023 Feb 24.

本文引用的文献

NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data.

Anal Chem. 2020 Apr 7;92(7):5082-5090. doi: 10.1021/acs.analchem.9b05460. Epub 2020 Mar 24.

WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis.

Anal Chim Acta. 2019 Jul 11;1061:60-69. doi: 10.1016/j.aca.2019.02.010. Epub 2019 Feb 19.

Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.

Anal Chem. 2019 Mar 5;91(5):3590-3596. doi: 10.1021/acs.analchem.8b05592. Epub 2019 Feb 19.

statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data.

Anal Chim Acta. 2018 Dec 7;1036:66-72. doi: 10.1016/j.aca.2018.08.002. Epub 2018 Aug 6.

Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

Metabolites. 2018 May 10;8(2):31. doi: 10.3390/metabo8020031.

Evaluation of batch effect elimination using quality control replicates in LC-MS metabolite profiling.

Anal Chim Acta. 2018 Aug 17;1019:38-48. doi: 10.1016/j.aca.2018.02.053. Epub 2018 Mar 1.

Navigating freely-available software tools for metabolomics analysis.

Metabolomics. 2017;13(9):106. doi: 10.1007/s11306-017-1242-7. Epub 2017 Aug 9.

RRmix: A method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards.

PLoS One. 2017 Jun 29;12(6):e0179530. doi: 10.1371/journal.pone.0179530. eCollection 2017.

Compound annotation in liquid chromatography/high-resolution mass spectrometry based metabolomics: robust adduct ion determination as a prerequisite to structure prediction in electrospray ionization mass spectra.

Rapid Commun Mass Spectrom. 2017 Aug 15;31(15):1261-1266. doi: 10.1002/rcm.7905.

xMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.

Anal Chem. 2017 Jan 17;89(2):1063-1067. doi: 10.1021/acs.analchem.6b01214. Epub 2017 Jan 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在数据预处理中解决 LC/MS 代谢组学数据的批次效应问题。

Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献