Maryland Psychiatric Research Center, School of Medicine, University of Maryland, Baltimore, MD 21201, United States.
The University of Maryland Institute for Health Computing (UM-IHC), North Bethesda, MD 20852, United States.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae288.
The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes.
We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions.
The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.
多模态组学数据的出现为从不同但互补的角度系统研究潜在的生物学机制提供了前所未有的机会。然而,多组学数据的联合分析仍然具有挑战性,因为它需要对多组高通量变量之间的相互作用进行建模。此外,这些相互作用模式可能因不同的临床组而异,反映了与疾病相关的生物学过程。
我们提出了一种称为差异典范相关分析(dCCA)的新方法,用于捕获两个临床组之间两个多元向量之间的差异协变模式。与经典的典范相关分析不同,该方法最大化两个多元向量之间的相关性,dCCA 旨在最大程度地恢复组间差异表达的多元到多元协变模式。我们开发了计算算法和工具包,用于从两组多元变量中稀疏地选择成对的变量子集,同时最大化差异协变。广泛的模拟分析表明,dCCA 在选择感兴趣的变量和恢复差异相关性方面具有优越的性能。我们将 dCCA 应用于癌症基因组图谱计划数据库中的 Pan-Kidney 队列,并鉴定了非编码 RNA 和基因表达之间的差异表达协变。
实现 dCCA 的 R 包可在 https://github.com/hwiyoungstat/dCCA 获得。