Bararpour Nasim, Gilardi Federica, Carmeli Cristian, Sidibe Jonathan, Ivanisevic Julijana, Caputo Tiziana, Augsburger Marc, Grabherr Silke, Desvergne Béatrice, Guex Nicolas, Bochud Murielle, Thomas Aurelien
Unit of Forensic Toxicology and Chemistry, CURML, Lausanne University Hospital-Geneva University Hospitals, Lausanne-Geneva, Switzerland.
Faculty Unit of Toxicology, CURML, Lausanne University Hospital, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland.
Sci Rep. 2021 Mar 11;11(1):5657. doi: 10.1038/s41598-021-84824-3.
As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes. While mass spectrometry-based (MS) metabolomics assays are endowed with high throughput and sensitivity, MWAS are doomed to long-term data acquisition generating an overtime-analytical signal drift that can hinder the uncovering of real biologically relevant changes. We developed "dbnorm", a package in the R environment, which allows for an easy comparison of the model performance of advanced statistical tools commonly used in metabolomics to remove batch effects from large metabolomics datasets. "dbnorm" integrates advanced statistical tools to inspect the dataset structure not only at the macroscopic (sample batches) scale, but also at the microscopic (metabolic features) level. To compare the model performance on data correction, "dbnorm" assigns a score that help users identify the best fitting model for each dataset. In this study, we applied "dbnorm" to two large-scale metabolomics datasets as a proof of concept. We demonstrate that "dbnorm" allows for the accurate selection of the most appropriate statistical tool to efficiently remove the overtime signal drift and to focus on the relevant biological components of complex datasets.
作为一种强大的表型分析技术,代谢组学通过全代谢组关联研究(MWAS)以及鉴定在各种生物过程中具有调节作用的代谢物,为生物标志物的发现提供了新的机遇。虽然基于质谱(MS)的代谢组学分析具有高通量和高灵敏度,但MWAS注定要进行长期的数据采集,这会产生随时间变化的分析信号漂移,从而可能阻碍发现真正具有生物学相关性的变化。我们开发了“dbnorm”,这是R环境中的一个软件包,它可以轻松比较代谢组学中常用的先进统计工具的模型性能,以便从大型代谢组学数据集中消除批次效应。“dbnorm”整合了先进的统计工具,不仅可以在宏观(样本批次)尺度上检查数据集结构,还可以在微观(代谢特征)层面进行检查。为了比较数据校正方面的模型性能,“dbnorm”会给出一个分数,帮助用户为每个数据集确定最合适的模型。在本研究中,我们将“dbnorm”应用于两个大规模代谢组学数据集作为概念验证。我们证明,“dbnorm”能够准确选择最合适的统计工具,以有效消除随时间变化的信号漂移,并专注于复杂数据集的相关生物学成分。