无监督多组学因子分析在揭示与心血管疾病相关的变异和分子过程模式中的应用。

Application of Unsupervised Multi-Omic Factor Analysis to Uncover Patterns of Variation and Molecular Processes Linked to Cardiovascular Disease.

机构信息

Institute of Computational Biology, German Research Center for Environmental Health, Helmholtz Zentrum München; Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich.

Medizinische Klinik und Poliklinik I University Hospital Ludwig-Maximilian University; German Centre for Cardiovascular Research (DZHK), partner site Munich Heart Alliance.

出版信息

J Vis Exp. 2024 Sep 20(211). doi: 10.3791/66659.

DOI:10.3791/66659

PMID:39373483

Abstract

Disease mechanisms are usually complex and governed by the interaction of several distinct molecular processes. Complex, multidimensional datasets are a valuable resource to generate more insights into those processes, but the analysis of such datasets can be challenging due to the high dimensionality resulting, for example, from different disease conditions, timepoints, and omics capturing the process at different resolutions. Here, we showcase an approach to analyze and explore such a complex multiomics dataset in an unsupervised way by applying multi-omics factor analysis (MOFA) to a dataset generated from blood samples that capture the immune response in acute and chronic coronary syndromes. The dataset consists of several assays at differing resolutions, including sample-level cytokine data, plasma-proteomics and neutrophil prime-seq, and single-cell RNA-seq (scRNA-seq) data. Further complexity is added by having several different time points measured per patient and several patient subgroups. The analysis workflow outlines how to integrate and analyze the data in several steps: (1) Data pre-processing and harmonization, (2) Estimation of the MOFA model, (3) Downstream analysis. Step 1 outlines how to process the features of the different data types, filter out low-quality features, and normalize them to harmonize their distributions for further analysis. Step 2 shows how to apply the MOFA model and explore the major sources of variance within the dataset across all omics and features. Step 3 presents several strategies for the downstream analysis of the captured patterns, linking them to the disease conditions and potential molecular processes governing those conditions. Overall, we present a workflow for unsupervised data exploration of complex multi-omics datasets to enable the identification of major axes of variation composed of differing molecular features that can also be applied to other contexts and multi-omics datasets (including other assays as presented in the exemplary use case).

摘要

疾病机制通常较为复杂，由多个不同的分子过程相互作用所调控。复杂的多维数据集是深入了解这些过程的宝贵资源，但由于例如不同的疾病状态、时间点和不同分辨率下捕获过程的组学数据等因素导致的高维性，对这些数据集的分析可能具有挑战性。在这里，我们展示了一种通过将多组学因子分析 (MOFA) 应用于从急性和慢性冠状动脉综合征血液样本中生成的数据集，以无监督的方式分析和探索这种复杂的多组学数据集的方法。该数据集由不同分辨率的多个测定法组成，包括样本水平细胞因子数据、血浆蛋白质组学和中性粒细胞 prime-seq 以及单细胞 RNA-seq(scRNA-seq) 数据。通过对每个患者测量几个不同的时间点和几个患者亚组，进一步增加了复杂性。分析工作流程概述了如何分几个步骤整合和分析数据：(1) 数据预处理和协调，(2) MOFA 模型的估计，(3) 下游分析。步骤 1 概述了如何处理不同数据类型的特征，过滤出低质量的特征，并对其进行标准化以协调它们的分布，以便进一步分析。步骤 2 展示了如何应用 MOFA 模型并探索数据集中所有组学和特征之间的主要方差源。步骤 3 提出了几种下游分析捕获模式的策略，将它们与疾病状况和调控这些状况的潜在分子过程联系起来。总体而言，我们提出了一种用于复杂多组学数据集无监督数据探索的工作流程，能够识别由不同分子特征组成的主要变化轴，也可应用于其他情况和多组学数据集（包括在示例用例中呈现的其他测定法）。