Li Lu, Hoefsloot Huub, Bakker Barbara M, Horner David, Rasmussen Morten A, Smilde Age K, Acar Evrim
School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai 519000, China.
Department of Data Science and Knowledge Discovery, Simula Metropolitan Center for Digital Engineering, 0130 Oslo, Norway.
Metabolites. 2024 Dec 24;15(1):2. doi: 10.3390/metabo15010002.
: Metabolomics measurements are noisy, often characterized by a small sample size and missing entries. While data-driven methods have shown promise in terms of analyzing metabolomics data, e.g., revealing biomarkers of various phenotypes, metabolomics data analysis can significantly benefit from incorporating prior information about metabolic mechanisms. This paper introduces a novel data analysis approach to incorporate mechanistic models in metabolomics data analysis. : We arranged time-resolved metabolomics measurements of plasma samples collected during a meal challenge test from the COPSAC cohort as a third-order tensor: by by . Simulated challenge test data generated using a human whole-body metabolic model were also arranged as a third-order tensor: by by . Real and simulated data sets were coupled in the mode and jointly analyzed using coupled tensor factorizations to reveal the underlying patterns. : Our experiments demonstrated that the joint analysis of simulated and real data had better performance in terms of pattern discovery, achieving higher correlations with a BMI (body mass index)-related phenotype compared to the analysis of only real data in males, while in females, the performance was comparable. We also demonstrated the advantages of such a joint analysis approach in the presence of incomplete measurements and its limitations in the presence of wrong prior information. : The joint analysis of real measurements and simulated data (generated using a mechanistic model) through coupled tensor factorizations guides real data analysis with prior information encapsulated in mechanistic models and reveals interpretable patterns.
代谢组学测量存在噪声,通常具有样本量小和数据缺失的特点。虽然数据驱动的方法在分析代谢组学数据方面已显示出前景,例如揭示各种表型的生物标志物,但代谢组学数据分析可以从纳入有关代谢机制的先验信息中显著受益。本文介绍了一种在代谢组学数据分析中纳入机制模型的新颖数据分析方法。:我们将在COPSAC队列的进餐挑战测试期间收集的血浆样本的时间分辨代谢组学测量值整理为一个三阶张量: × × 。使用人体全身代谢模型生成的模拟挑战测试数据也整理为一个三阶张量: × × 。真实数据集和模拟数据集在 模式下耦合,并使用耦合张量分解进行联合分析以揭示潜在模式。:我们的实验表明,与仅对男性真实数据进行分析相比,模拟数据和真实数据的联合分析在模式发现方面具有更好的性能,与体重指数(BMI)相关表型的相关性更高,而在女性中,性能相当。我们还展示了这种联合分析方法在存在不完整测量时的优势及其在存在错误先验信息时的局限性。:通过耦合张量分解对真实测量值和模拟数据(使用机制模型生成)进行联合分析,以机制模型中封装的先验信息指导真实数据分析,并揭示可解释的模式。