Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.
Department of Anatomy & Neurobiology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.
Curr Top Med Chem. 2018;18(11):883-895. doi: 10.2174/1568026618666180711144323.
Contemporary metabolomics experiments generate a rich array of complex high-dimensional data. Consequently, there have been concurrent efforts to develop methodological standards and analytical workflows to streamline the generation of meaningful biochemical and clinical inferences from raw data generated using an analytical platform like mass spectrometry. While such considerations have been frequently addressed in untargeted metabolomics (i.e., the broad survey of all distinguishable metabolites within a sample of interest), this methodological scrutiny has seldom been applied to data generated using commercial, targeted metabolomics kits. We suggest that this may, in part, account for past and more recent incomplete replications of previously specified biomarker panels. Herein, we identify common impediments challenging the analysis of raw, targeted metabolomic abundance data from a commercial kit and review methods to remedy these issues. In doing so, we propose an analytical pipeline suitable for the pre-processing of data for downstream biomarker discovery. Operational and statistical considerations for integrating targeted data sets across experimental sites and analytical batches are discussed, as are best practices for developing predictive models relating pre-processed metabolomic data to associated phenotypic information.
当代代谢组学实验产生了丰富多样的复杂高维数据。因此,人们一直在努力开发方法标准和分析工作流程,以便从使用质谱等分析平台生成的原始数据中得出有意义的生化和临床推论。虽然在非靶向代谢组学(即广泛调查感兴趣样本中所有可区分的代谢物)中经常考虑到这些因素,但这种方法学的审查很少应用于使用商业靶向代谢组学试剂盒生成的数据。我们认为,这在一定程度上解释了过去和最近对以前指定的生物标志物面板的不完全复制。在此,我们确定了从商业试剂盒分析原始靶向代谢组学丰度数据所面临的常见障碍,并回顾了纠正这些问题的方法。在这样做的过程中,我们提出了一个适合下游生物标志物发现的数据分析管道。讨论了在实验站点和分析批次之间集成靶向数据集的操作和统计考虑因素,以及将预处理代谢组学数据与相关表型信息相关联的预测模型开发的最佳实践。