通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.

作者信息

Safo Sandra E, Li Shuzhao, Long Qi

机构信息

Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A.

Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Emory University, Atlanta, Georgia, U.S.A.

出版信息

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

DOI:10.1111/biom.12715

PMID:28482123

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5677597/

Abstract

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.

摘要

高维组学数据的综合分析越来越受欢迎。与此同时，在组学数据分析中纳入变量之间已知的功能关系已被证明有助于阐明复杂疾病的潜在机制。在本文中，我们的目标是评估来自预测健康研究所（PHI）一项研究的转录组学和代谢组学数据之间的关联，该研究纳入了有患心血管疾病高风险的健康成年人。我们采用一种数据驱动和基于知识的策略，开发了用于稀疏典型相关分析（CCA）并纳入已知生物学信息的统计方法。我们提出的方法利用基因之间和代谢物之间的先验网络结构信息来指导稀疏CCA中相关基因和代谢物的选择，从而深入了解心血管疾病的分子基础。我们的模拟表明，当结构信息具有信息量时，结构化稀疏CCA方法在选择相关基因和代谢物方面优于几种现有的稀疏CCA方法，并且对错误指定的结构信息具有鲁棒性。我们对PHI研究的分析表明，我们提出的方法所选择的基因和代谢物集合中富集了许多基因和代谢途径，包括一些已知与心血管疾病相关的途径。

相似文献

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data.稀疏广义特征值问题及其在甲基化与基因表达数据综合分析的典型相关分析中的应用

Biometrics. 2018 Dec;74(4):1362-1371. doi: 10.1111/biom.12886. Epub 2018 May 11.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Sparse canonical correlation analysis from a predictive point of view.从预测角度看稀疏典型相关分析。

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

An iterative penalized least squares approach to sparse canonical correlation analysis.一种用于稀疏典型相关分析的迭代惩罚最小二乘法。

Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.

Robust sparse canonical correlation analysis.稳健稀疏典型相关分析

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

Extensions of sparse canonical correlation analysis with applications to genomic data.稀疏典型相关分析的扩展及其在基因组数据中的应用

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data.稀疏多重共惯性分析及其在多组学数据综合分析中的应用。

BMC Bioinformatics. 2020 Apr 15;21(1):141. doi: 10.1186/s12859-020-3455-4.

Performing Sparse Regularization and Dimension Reduction Simultaneously in Multimodal Data Fusion.在多模态数据融合中同时进行稀疏正则化和降维

Front Neurosci. 2019 Jul 3;13:642. doi: 10.3389/fnins.2019.00642. eCollection 2019.

引用本文的文献

Roadblocks of Urinary EV Biomarkers: Moving Toward the Clinic.尿液细胞外囊泡生物标志物的障碍：迈向临床应用

J Extracell Vesicles. 2025 Jul;14(7):e70120. doi: 10.1002/jev2.70120.

DeepIDA-GRU: a deep learning pipeline for integrative discriminant analysis of cross-sectional and longitudinal multiview data with applications to inflammatory bowel disease classification.DeepIDA-GRU：一种用于整合跨截面和纵向多视图数据的鉴别分析的深度学习管道，应用于炎症性肠病分类。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae339.

Knowledge-guided learning methods for integrative analysis of multi-omics data.用于多组学数据综合分析的知识引导学习方法。

Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.

Single-cell biclustering for cell-specific transcriptomic perturbation detection in AD progression.用于检测阿尔茨海默病进展中细胞特异性转录组扰动的单细胞双聚类分析

Cell Rep Methods. 2024 Apr 22;4(4):100742. doi: 10.1016/j.crmeth.2024.100742. Epub 2024 Mar 29.

Interpretable deep learning methods for multiview learning.多视图学习的可解释深度学习方法。

BMC Bioinformatics. 2024 Feb 14;25(1):69. doi: 10.1186/s12859-024-05679-9.

Identifying important gene signatures of BMI using network structure-aided nonparametric quantile regression.利用网络结构辅助非参数分位数回归鉴定 BMI 的重要基因特征。

Stat Med. 2023 May 10;42(10):1625-1639. doi: 10.1002/sim.9691. Epub 2023 Feb 23.

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

AIME: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments.AIME：基于自动编码器的综合多组学数据嵌入，允许进行混杂因素调整。

PLoS Comput Biol. 2022 Jan 26;18(1):e1009826. doi: 10.1371/journal.pcbi.1009826. eCollection 2022 Jan.

Operationalizing the Exposome Using Passive Silicone Samplers.使用被动硅胶采样器实现暴露组研究的实际操作

Curr Pollut Rep. 2022;8(1):1-29. doi: 10.1007/s40726-021-00211-6. Epub 2022 Jan 4.

Deep IDA: A Deep Learning Method for Integrative Discriminant Analysis of Multi-View Data with Feature Ranking-An Application to COVID-19 severity.深度IDA：一种用于多视图数据综合判别分析及特征排序的深度学习方法——在新冠肺炎严重程度上的应用

ArXiv. 2021 Nov 18:arXiv:2111.09964v2.

本文引用的文献

KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。

Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.

Regulation of uric acid metabolism and excretion.尿酸代谢与排泄的调节。

Int J Cardiol. 2016 Jun 15;213:8-14. doi: 10.1016/j.ijcard.2015.08.109. Epub 2015 Aug 14.

MetaboAnalyst 3.0--making metabolomics more meaningful.MetaboAnalyst 3.0——让代谢组学更具意义。

Nucleic Acids Res. 2015 Jul 1;43(W1):W251-7. doi: 10.1093/nar/gkv380. Epub 2015 Apr 20.

Predicting network activity from high throughput metabolomics.从高通量代谢组学预测网络活动。

PLoS Comput Biol. 2013;9(7):e1003123. doi: 10.1371/journal.pcbi.1003123. Epub 2013 Jul 4.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.基于结构约束的稀疏典型相关分析及其在微生物组数据分析中的应用。

Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis.稀疏典型相关分析中惩罚函数的比较

Comput Stat Data Anal. 2012 Feb 1;56(2):245-254. doi: 10.1016/j.csda.2011.07.012.

Incorporating predictor network in penalized regression with application to microarray data.将预测网络纳入惩罚回归并应用于微阵列数据。

Biometrics. 2010 Jun;66(2):474-84. doi: 10.1111/j.1541-0420.2009.01296.x. Epub 2009 Jul 23.

ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.用于基因列表富集分析和候选基因优先级排序的ToppGene Suite。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W305-11. doi: 10.1093/nar/gkp427. Epub 2009 May 22.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

Sparse canonical correlation analysis with application to genomic data integration.应用于基因组数据整合的稀疏典型相关分析。

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验