Beirnaert Charlie, Peeters Laura, Meysman Pieter, Bittremieux Wout, Foubert Kenn, Custers Deborah, Van der Auwera Anastasia, Cuykx Matthias, Pieters Luc, Covaci Adrian, Laukens Kris
Adrem Data Lab, Department of Mathematics and Computer Science, University of Antwerp, 2000 Antwerp, Belgium.
Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
Metabolites. 2019 Mar 20;9(3):54. doi: 10.3390/metabo9030054.
Data analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. As untargeted metabolomics datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficiently sophisticated. In addition, the ground truth for untargeted metabolomics experiments is intrinsically unknown and the performance of tools is difficult to evaluate. Here, the problem of dynamic multi-class metabolomics experiments was investigated using a simulated dataset with a known ground truth. This simulated dataset was used to evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning. The results were compared to EDGE, a statistical method for time series data. This paper presents three novel outcomes. The first is a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs. control metabolomics data. Third, the tinderesting method is introduced to analyse more complex dynamic metabolomics experiments. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.
由于新型工具的大量涌现和现有工作流程的标准化,代谢组学的数据分析正在迅速发展。随着非靶向代谢组学数据集和实验的规模和复杂性不断增加,标准化工作流程往往不够完善。此外,非靶向代谢组学实验的基本事实本质上是未知的,工具的性能也难以评估。在此,使用具有已知基本事实的模拟数据集研究了动态多类代谢组学实验的问题。该模拟数据集用于评估tinderesting的性能,tinderesting是一种基于收集专家知识用于机器学习的新型直观工具。将结果与EDGE(一种用于时间序列数据的统计方法)进行了比较。本文提出了三个新成果。第一个是基于常微分方程模拟具有已知基本事实的动态代谢组学数据的方法。该方法可通过MetaboLouise R包获得。其次,最初为基因组数据分析开发的EDGE工具在分析动态病例与对照代谢组学数据方面具有很高的性能。第三,引入了tinderesting方法来分析更复杂的动态代谢组学实验。该工具由一个用于收集专家知识的Shiny应用程序组成,进而用于训练机器学习模型以模拟专家的决策过程。这种方法不会取代传统的代谢组学数据分析工作流程,但可以提供额外信息、提高性能或更易于解释结果。其优点是该工具与实验的复杂性无关,因此在高级设置中更易于使用。所呈现分析的所有代码、MetaboLouise和tinderesting均可免费获取。