Suppr超能文献

稀疏典型相关分析的扩展及其在基因组数据中的应用

Extensions of sparse canonical correlation analysis with applications to genomic data.

作者信息

Witten Daniela M, Tibshirani Robert J

机构信息

Stanford University, USA.

出版信息

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

Abstract

In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of high-dimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of outcome measurements that may be available for each observation (e.g., survival time or cancer subtype). We propose an extension to sparse CCA, which we call sparse supervised CCA, which results in the identification of linear combinations of the two sets of variables that are correlated with each other and associated with the outcome. (2) It is becoming increasingly common for researchers to collect data on more than two assays on the same set of samples; for instance, SNP, gene expression, and DNA copy number measurements may all be available. We develop sparse multiple CCA in order to extend the sparse CCA methodology to the case of more than two data sets. We demonstrate these new methods on simulated data and on a recently published and publicly available diffuse large B-cell lymphoma data set.

摘要

在最近的研究中,几位作者介绍了稀疏典型相关分析(sparse CCA)方法。假设有关于同一组观测对象的两组测量数据。稀疏典型相关分析是一种用于识别两组变量之间高度相关的稀疏线性组合的方法。当在同一组样本上有两组检测数据时,它已被证明在高维基因组数据分析中很有用。在本文中,我们提出了对稀疏典型相关分析方法的两种扩展。(1)稀疏典型相关分析是一种无监督方法;也就是说,它不利用可能适用于每个观测对象的结果测量值(例如,生存时间或癌症亚型)。我们提出了对稀疏典型相关分析的一种扩展,我们称之为稀疏监督典型相关分析,它能识别出两组相互关联且与结果相关的变量的线性组合。(2)研究人员在同一组样本上收集超过两组检测数据的情况越来越普遍;例如,单核苷酸多态性(SNP)、基因表达和DNA拷贝数测量数据可能都有。我们开发了稀疏多重典型相关分析,以便将稀疏典型相关分析方法扩展到多于两个数据集的情况。我们在模拟数据以及最近发表的公开可用的弥漫性大B细胞淋巴瘤数据集上展示了这些新方法。

相似文献

1
Extensions of sparse canonical correlation analysis with applications to genomic data.
Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.
2
Group sparse canonical correlation analysis for genomic data integration.
BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.
3
Robust sparse canonical correlation analysis.
BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.
4
Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis.
IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):194-200. doi: 10.1109/TPAMI.2010.160.
5
Sparse canonical correlation analysis with application to genomic data integration.
Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.
6
Sparse canonical correlation analysis from a predictive point of view.
Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.
7
Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.
BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.
8
An iterative penalized least squares approach to sparse canonical correlation analysis.
Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.
9
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.
10
FDR-Corrected Sparse Canonical Correlation Analysis With Applications to Imaging Genomics.
IEEE Trans Med Imaging. 2018 Aug;37(8):1761-1774. doi: 10.1109/TMI.2018.2815583. Epub 2018 Mar 13.

引用本文的文献

2
Trustworthy causal biomarker discovery: a multiomics brain imaging genetics-based approach.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i227-i236. doi: 10.1093/bioinformatics/btaf257.
3
Neural encoding of real world face perception.
ArXiv. 2025 May 13:arXiv:2505.08831v1.
5
Mutual-assistance learning for trustworthy biomarker discovery and disease prediction.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf178.
6
A generalized higher-order correlation analysis framework for multi-omics network inference.
PLoS Comput Biol. 2025 Apr 14;21(4):e1011842. doi: 10.1371/journal.pcbi.1011842. eCollection 2025 Apr.
7
Role of evolving sea surface temperature modes of variability in improving seasonal precipitation forecasts.
Commun Earth Environ. 2025;6(1):256. doi: 10.1038/s43247-025-02235-y. Epub 2025 Apr 3.
8
Multimodal data integration in early-stage breast cancer.
Breast. 2025 Apr;80:103892. doi: 10.1016/j.breast.2025.103892. Epub 2025 Jan 28.
9
NMFProfiler: a multi-omics integration method for samples stratified in groups.
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf066.

本文引用的文献

1
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.
2
Sparse canonical correlation analysis with application to genomic data integration.
Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.
3
Sparse canonical methods for biological data integration: application to a cross-platform study.
BMC Bioinformatics. 2009 Jan 26;10:34. doi: 10.1186/1471-2105-10-34.
4
Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways.
Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13520-5. doi: 10.1073/pnas.0804295105. Epub 2008 Sep 2.
5
Genome-wide sparse canonical correlation of gene expression with genotypes.
BMC Proc. 2007;1 Suppl 1(Suppl 1):S119. doi: 10.1186/1753-6561-1-s1-s119. Epub 2007 Dec 18.
6
Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis.
Stat Appl Genet Mol Biol. 2008;7(1):Article3. doi: 10.2202/1544-6115.1329. Epub 2008 Jan 23.
7
Spatial smoothing and hot spot detection for CGH data using the fused lasso.
Biostatistics. 2008 Jan;9(1):18-29. doi: 10.1093/biostatistics/kxm013. Epub 2007 May 18.
8
Relative impact of nucleotide and copy number variation on gene expression phenotypes.
Science. 2007 Feb 9;315(5813):848-53. doi: 10.1126/science.1136678.
9
Genome-wide associations of gene expression variation in humans.
PLoS Genet. 2005 Dec;1(6):e78. doi: 10.1371/journal.pgen.0010078. Epub 2005 Dec 16.
10
Genetic analysis of genome-wide variation in human gene expression.
Nature. 2004 Aug 12;430(7001):743-7. doi: 10.1038/nature02797. Epub 2004 Jul 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验