Institute of Cytology and Genetics SB RAS, Novosibirsk, Lavrentieva Ave. 10, 630090, Russia.
Natural Scince Department, Novosibirsk State University, Novosibirsk, Pirogova Str. 1, 630090, Russia.
Gigascience. 2018 Dec 1;7(12):giy137. doi: 10.1093/gigascience/giy137.
Genome-wide association studies have identified hundreds of loci that influence a wide variety of complex human traits; however, little is known regarding the biological mechanism of action of these loci. The recent accumulation of functional genomics ("omics"), including metabolomics data, has created new opportunities for studying the functional role of specific changes in the genome. Functional genomic data are characterized by their high dimensionality, the presence of (strong) statistical dependency between traits, and, potentially, complex genetic control. Therefore, the analysis of such data requires specific statistical genetics methods.
To facilitate our understanding of the genetic control of omics phenotypes, we propose a trait-centered, network-based conditional genetic association (cGAS) approach for identifying the direct effects of genetic variants on omics-based traits. For each trait of interest, we selected from a biological network a set of other traits to be used as covariates in the cGAS. The network can be reconstructed either from biological pathway databases (a mechanistic approach) or directly from the data, using a Gaussian graphical model applied to the metabolome (a data-driven approach). We derived mathematical expressions that allow comparison of the power of univariate analyses with conditional genetic association analyses. We then tested our approach using data from a population-based Cooperative Health Research in the region of Augsburg (KORA) study (n = 1,784 subjects, 1.7 million single-nucleotide polymorphisms) with measured data for 151 metabolites.
We found that compared to single-trait analysis, performing a genetic association analysis that includes biologically relevant covariates can either gain or lose power, depending on specific pleiotropic scenarios, for which we provide empirical examples. In the context of analyzed metabolomics data, the mechanistic network approach had more power compared to the data-driven approach. Nevertheless, we believe that our analysis shows that neither a prior-knowledge-only approach nor a phenotypic-data-only approach is optimal, and we discuss possibilities for improvement.
全基因组关联研究已经确定了数百个影响广泛的复杂人类特征的基因座;然而,对于这些基因座的作用机制知之甚少。最近功能基因组学(“组学”)的积累,包括代谢组学数据,为研究特定基因组变化的功能作用创造了新的机会。功能基因组数据的特点是其高维性、特征之间存在(强)统计依赖性,以及潜在的复杂遗传控制。因此,此类数据的分析需要特定的统计遗传学方法。
为了帮助我们理解组学表型的遗传控制,我们提出了一种以特征为中心的、基于网络的条件遗传关联(cGAS)方法,用于识别遗传变异对基于组学的特征的直接影响。对于每个感兴趣的特征,我们从生物网络中选择一组其他特征作为 cGAS 的协变量。网络可以从生物途径数据库(一种基于机制的方法)或直接从数据中重建,使用应用于代谢组的高斯图形模型(一种基于数据的方法)。我们推导出了允许比较单变量分析和条件遗传关联分析的功效的数学表达式。然后,我们使用来自基于人群的奥格斯堡合作健康研究(KORA)研究(n=1784 名受试者,170 万个单核苷酸多态性)的数据以及 151 种代谢物的测量数据测试了我们的方法。
我们发现,与单特征分析相比,进行包括生物学上相关协变量的遗传关联分析,根据具体的多效性情况,可能会增加或降低功效,我们提供了实证示例。在分析代谢组学数据的背景下,与基于数据的方法相比,基于机制的网络方法具有更高的功效。然而,我们认为我们的分析表明,仅基于先验知识的方法或仅基于表型数据的方法都不是最优的,我们讨论了改进的可能性。