Safo Sandra E, Li Shuzhao, Long Qi
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A.
Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Emory University, Atlanta, Georgia, U.S.A.
Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.
Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.
高维组学数据的综合分析越来越受欢迎。与此同时,在组学数据分析中纳入变量之间已知的功能关系已被证明有助于阐明复杂疾病的潜在机制。在本文中,我们的目标是评估来自预测健康研究所(PHI)一项研究的转录组学和代谢组学数据之间的关联,该研究纳入了有患心血管疾病高风险的健康成年人。我们采用一种数据驱动和基于知识的策略,开发了用于稀疏典型相关分析(CCA)并纳入已知生物学信息的统计方法。我们提出的方法利用基因之间和代谢物之间的先验网络结构信息来指导稀疏CCA中相关基因和代谢物的选择,从而深入了解心血管疾病的分子基础。我们的模拟表明,当结构信息具有信息量时,结构化稀疏CCA方法在选择相关基因和代谢物方面优于几种现有的稀疏CCA方法,并且对错误指定的结构信息具有鲁棒性。我们对PHI研究的分析表明,我们提出的方法所选择的基因和代谢物集合中富集了许多基因和代谢途径,包括一些已知与心血管疾病相关的途径。