Odom Gabriel J, Colaprico Antonio, Silva Tiago C, Chen X Steven, Wang Lily
Department of Biostatistics, Stempel College of Public Health, Florida International University, Miami, FL, United States.
Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.
Front Genet. 2021 Dec 22;12:783713. doi: 10.3389/fgene.2021.783713. eCollection 2021.
Recent advances in technology have made multi-omics datasets increasingly available to researchers. To leverage the wealth of information in multi-omics data, a number of integrative analysis strategies have been proposed recently. However, effectively extracting biological insights from these large, complex datasets remains challenging. In particular, matched samples with multiple types of omics data measured on each sample are often required for multi-omics analysis tools, which can significantly reduce the sample size. Another challenge is that analysis techniques such as dimension reductions, which extract association signals in high dimensional datasets by estimating a few variables that explain most of the variations in the samples, are typically applied to whole-genome data, which can be computationally demanding. Here we present pathwayMultiomics, a pathway-based approach for integrative analysis of multi-omics data with categorical, continuous, or survival outcome variables. The input of pathwayMultiomics is pathway values for individual omics data types, which are then integrated using a novel statistic, the MiniMax statistic, to prioritize pathways dysregulated in multiple types of omics datasets. Importantly, pathwayMultiomics is computationally efficient and does not require matched samples in multi-omics data. We performed a comprehensive simulation study to show that pathwayMultiomics significantly outperformed currently available multi-omics tools with improved power and well-controlled false-positive rates. In addition, we also analyzed real multi-omics datasets to show that pathwayMultiomics was able to recover known biology by nominating biologically meaningful pathways in complex diseases such as Alzheimer's disease.
技术的最新进展使多组学数据集越来越多地可供研究人员使用。为了利用多组学数据中的丰富信息,最近提出了一些综合分析策略。然而,从这些庞大而复杂的数据集中有效提取生物学见解仍然具有挑战性。特别是,多组学分析工具通常需要对每个样本测量多种类型组学数据的匹配样本,这可能会显著减少样本量。另一个挑战是,诸如降维之类的分析技术,通过估计少数几个解释样本中大部分变异的变量来提取高维数据集中的关联信号,通常应用于全基因组数据,这在计算上要求很高。在这里,我们介绍pathwayMultiomics,这是一种基于通路的方法,用于对具有分类、连续或生存结果变量的多组学数据进行综合分析。pathwayMultiomics的输入是各个组学数据类型的通路值,然后使用一种新颖的统计量——极小极大统计量进行整合,以对在多种组学数据集中失调的通路进行优先级排序。重要的是,pathwayMultiomics在计算上效率很高,并且不需要多组学数据中的匹配样本。我们进行了一项全面的模拟研究,以表明pathwayMultiomics在提高功效和良好控制假阳性率方面显著优于目前可用的多组学工具。此外,我们还分析了真实的多组学数据集,以表明pathwayMultiomics能够通过在阿尔茨海默病等复杂疾病中提名具有生物学意义的通路来恢复已知的生物学信息。