European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.
European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.
Mol Syst Biol. 2018 Jun 20;14(6):e8124. doi: 10.15252/msb.20178124.
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
多组学研究有望改善对分子层面生物过程的描述。然而,目前缺乏对由此产生的异构数据集进行无监督整合的方法。我们提出了多组学因子分析(Multi-Omics Factor Analysis,MOFA),这是一种用于发现多组学数据集主要变异源的计算方法。MOFA 推断出一组(隐藏)因子,这些因子可以捕获生物和技术变异性的来源。它可以分解跨多个模态共享的异质轴,以及特定于单个数据模态的异质轴。学到的因子可以支持各种下游分析,包括识别样本亚组、数据插补和异常样本检测。我们将 MOFA 应用于 200 名慢性淋巴细胞白血病患者样本的队列,这些样本进行了体细胞突变、RNA 表达、DNA 甲基化和药物反应的分析。MOFA 确定了疾病异质性的主要维度,包括免疫球蛋白重链可变区状态、12 号染色体三体和以前被低估的驱动因素,如对氧化应激的反应。在第二个应用中,我们使用 MOFA 分析了单细胞多组学数据,确定了细胞分化过程中协调的转录和表观遗传变化。