Fan Sili, Wilson Christopher M, Fridley Brooke L, Li Qian
Graduate Group of Biostatistics, University of California, Davis, CA, USA.
Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA.
Methods Mol Biol. 2023;2629:247-269. doi: 10.1007/978-1-0716-2986-4_12.
In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.
在本章中,我们回顾了质谱代谢组学研究中用于缺失值插补、归一化及下游分析的前沿统计和机器学习方法,并通过示例数据集进行说明。缺失峰恢复方法包括以零值或检测限进行简单插补、基于回归或基于分布的插补以及随机森林预测。批次效应可通过数据驱动方法、基于内标的归一化和基于质量控制样本的归一化来消除。我们还总结了代谢组学与临床结果的不同类型统计分析,例如代谢生物标志物推断、代谢组学谱聚类、代谢物模块构建以及与转录组的整合分析。