Université Clermont Auvergne, INRA, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, 63000, Clermont-Ferrand, France.
Centre de Recherche du Centre hospitalier de l'Université de Montréal, Montréal, Canada.
Metabolomics. 2019 Oct 3;15(10):134. doi: 10.1007/s11306-019-1598-y.
Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure.
The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data.
This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions.
To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes.
In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.
代谢组学是营养与健康研究中一种强大的表型工具,可生成复杂的数据,需要专门的处理方法来丰富对生物系统的认识。特别是,为了研究环境因素、表型和代谢之间的关系,通常分别对代谢组学数据集进行判别统计分析,并结合元数据进行关联。另一种相关策略是通过多块偏最小二乘判别分析(MBPLSDA)同时分析主题数据块,从而确定变量和块在区分主题组方面的重要性,同时考虑数据结构。
本研究的目的是开发一个完整的开源独立工具,允许对代谢组学和流行病学数据进行 MBPLSDA 的联合分析的所有步骤。
该工具基于 ade4 R 包中的 mbpls 函数,并具有丰富的功能,包括一些专门用于判别分析的功能。提供的指标有助于确定最佳组件数量,检查 MBPLSDA 模型的有效性,并评估其参数和预测的可变性。
为了说明该工具的潜力,将 MBPLSDA 应用于一个实际的案例研究,涉及人类队列的代谢组学、营养和临床数据。在单个 R 包中提供不同的功能,允许优化参数,以有效地联合分析代谢组学和流行病学数据,从而深入了解多维表型。
特别是,我们强调了事先过滤代谢变量的影响,以及与标准 PLS 判别分析方法相比,MBPLSDA 方法的相关性。