Revilla Lluís, Mayorgas Aida, Corraliza Ana M, Masamunt Maria C, Metwaly Amira, Haller Dirk, Tristán Eva, Carrasco Anna, Esteve Maria, Panés Julian, Ricart Elena, Lozano Juan J, Salas Azucena
Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), Barcelona, Spain.
Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain.
PLoS One. 2021 Feb 8;16(2):e0246367. doi: 10.1371/journal.pone.0246367. eCollection 2021.
Personalized medicine requires finding relationships between variables that influence a patient's phenotype and predicting an outcome. Sparse generalized canonical correlation analysis identifies relationships between different groups of variables. This method requires establishing a model of the expected interaction between those variables. Describing these interactions is challenging when the relationship is unknown or when there is no pre-established hypothesis. Thus, our aim was to develop a method to find the relationships between microbiome and host transcriptome data and the relevant clinical variables in a complex disease, such as Crohn's disease.
We present here a method to identify interactions based on canonical correlation analysis. We show that the model is the most important factor to identify relationships between blocks using a dataset of Crohn's disease patients with longitudinal sampling. First the analysis was tested in two previously published datasets: a glioma and a Crohn's disease and ulcerative colitis dataset where we describe how to select the optimum parameters. Using such parameters, we analyzed our Crohn's disease data set. We selected the model with the highest inner average variance explained to identify relationships between transcriptome, gut microbiome and clinically relevant variables. Adding the clinically relevant variables improved the average variance explained by the model compared to multiple co-inertia analysis.
The methodology described herein provides a general framework for identifying interactions between sets of omic data and clinically relevant variables. Following this method, we found genes and microorganisms that were related to each other independently of the model, while others were specific to the model used. Thus, model selection proved crucial to finding the existing relationships in multi-omics datasets.
个性化医疗需要找出影响患者表型的变量之间的关系,并预测结果。稀疏广义典型相关分析可识别不同变量组之间的关系。该方法需要建立这些变量之间预期相互作用的模型。当关系未知或没有预先建立的假设时,描述这些相互作用具有挑战性。因此,我们的目标是开发一种方法,以找出复杂疾病(如克罗恩病)中微生物组与宿主转录组数据以及相关临床变量之间的关系。
我们在此提出一种基于典型相关分析来识别相互作用的方法。我们表明,使用纵向采样的克罗恩病患者数据集时,该模型是识别不同数据块之间关系的最重要因素。首先,在两个先前发表的数据集上对该分析进行了测试:一个胶质瘤数据集以及一个克罗恩病和溃疡性结肠炎数据集,我们在其中描述了如何选择最佳参数。使用这些参数,我们分析了我们的克罗恩病数据集。我们选择了具有最高内部平均可解释方差的模型,以识别转录组、肠道微生物组和临床相关变量之间的关系。与多重协惯性分析相比,添加临床相关变量提高了模型的平均可解释方差。
本文所述方法提供了一个用于识别组学数据与临床相关变量之间相互作用的通用框架。按照此方法,我们发现了一些基因和微生物,它们相互之间的关系独立于模型,而其他一些则特定于所使用的模型。因此,模型选择对于在多组学数据集中发现现有关系至关重要。