稳健稀疏典型相关分析

Robust sparse canonical correlation analysis.

作者信息

Wilms Ines, Croux Christophe

机构信息

Leuven Statistics Research Centre (LStat), KU Leuven, Naamsestraat 69, Leuven, 3000, Belgium.

出版信息

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

DOI:10.1186/s12918-016-0317-9

PMID:27516087

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4982144/

Abstract

BACKGROUND

Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data.

RESULTS

We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers.

CONCLUSIONS

Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers.

摘要

背景

典型相关分析（CCA）是一种多变量统计方法，用于描述两组变量之间的关联。其目的是找到每个数据集中具有最大相关性的变量线性组合。在基因组学中，CCA对于估计基因表达数据和DNA拷贝数变化数据之间的关联变得越来越重要。识别这种关联可能有助于增进我们对诸如癌症等疾病发展的理解。然而，这些数据集通常是高维的，相对于对象数量而言包含大量变量。此外，数据集中可能包含非典型观测值，因为对象对处理的反应可能不同。我们讨论了一种稳健稀疏CCA方法，从而为这两个问题提供了解决方案。稀疏估计产生的典型向量中，其一些元素被估计为恰好为零。因此，它们的可解释性得到了提高。稳健方法可以应对数据中的非典型观测值。

结果

我们通过几个模拟研究和三个生物统计学实例说明了稳健稀疏CCA方法的良好性能。稳健稀疏CCA在以下方面明显优于其主要替代方法：（1）正确检测数据集之间的主要关联；（2）准确估计这些关联；（3）检测异常值。

结论

稳健稀疏CCA提供了可解释的典型向量，同时能够应对异常观测值。所提出的方法能够描述高维数据集之间的关联，而高维数据集在当今基因组学中很常见。此外，稳健稀疏CCA方法还能够对异常值进行特征描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9052/4982144/b01da37e4bc0/12918_2016_317_Fig1_HTML.jpg

相似文献

Robust sparse canonical correlation analysis.

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

Sparse canonical correlation analysis from a predictive point of view.

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Extensions of sparse canonical correlation analysis with applications to genomic data.

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

An iterative penalized least squares approach to sparse canonical correlation analysis.

Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.

Group sparse canonical correlation analysis for genomic data integration.

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

Sparse canonical methods for biological data integration: application to a cross-platform study.

BMC Bioinformatics. 2009 Jan 26;10:34. doi: 10.1186/1471-2105-10-34.

Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis.

Bioinformatics. 2009 Nov 1;25(21):2764-71. doi: 10.1093/bioinformatics/btp491. Epub 2009 Aug 17.

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.

BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.

Sparse canonical correlation analysis with application to genomic data integration.

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

Resistant multiple sparse canonical correlation.

Stat Appl Genet Mol Biol. 2016 Apr;15(2):123-38. doi: 10.1515/sagmb-2014-0081.

引用本文的文献

Longitudinal Canonical Correlation Analysis.

J R Stat Soc Ser C Appl Stat. 2023 Jun;72(3):587-607. doi: 10.1093/jrsssc/qlad022. Epub 2023 Apr 5.

sJIVE: Supervised Joint and Individual Variation Explained.

Comput Stat Data Anal. 2022 Nov;175. doi: 10.1016/j.csda.2022.107547. Epub 2022 Jun 14.

The statistical theory of linear selection indices from phenotypic to genomic selection.

Crop Sci. 2022 Mar-Apr;62(2):537-563. doi: 10.1002/csc2.20676. Epub 2022 Feb 6.

Searching for a technology-driven acute rheumatic fever test: the START study protocol.

BMJ Open. 2021 Sep 15;11(9):e053720. doi: 10.1136/bmjopen-2021-053720.

Statistical Integration of 'Omics Data Increases Biological Knowledge Extracted from Metabolomics Data: Application to Intestinal Exposure to the Mycotoxin Deoxynivalenol.

Metabolites. 2021 Jun 21;11(6):407. doi: 10.3390/metabo11060407.

Model-based joint visualization of multiple compositional omics datasets.

NAR Genom Bioinform. 2020 Jul 21;2(3):lqaa050. doi: 10.1093/nargab/lqaa050. eCollection 2020 Sep.

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples.

Genetics. 2019 Jul;212(3):919-929. doi: 10.1534/genetics.118.301865. Epub 2019 May 21.

Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis.

BMC Genomics. 2019 Jan 22;20(1):75. doi: 10.1186/s12864-019-5433-7.

The changes of immunoglobulin G N-glycosylation in blood lipids and dyslipidaemia.

J Transl Med. 2018 Aug 29;16(1):235. doi: 10.1186/s12967-018-1616-2.

A data-driven investigation of relationships between bipolar psychotic symptoms and schizophrenia genome-wide significant genetic loci.

Am J Med Genet B Neuropsychiatr Genet. 2018 Jun;177(4):468-475. doi: 10.1002/ajmg.b.32635. Epub 2018 Apr 19.

本文引用的文献

Sparse canonical correlation analysis from a predictive point of view.

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Sparse representation approaches for the classification of high-dimensional biological data.

BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S6. doi: 10.1186/1752-0509-7-S4-S6. Epub 2013 Oct 23.

An improved sparse representation model with structural information for Multicolour Fluorescence In-Situ Hybridization (M-FISH) image classification.

BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S5. doi: 10.1186/1752-0509-7-S4-S5. Epub 2013 Oct 23.

Variable selection for generalized canonical correlation analysis.

Biostatistics. 2014 Jul;15(3):569-83. doi: 10.1093/biostatistics/kxu001. Epub 2014 Feb 17.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.

Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.

A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA.

Ann Appl Stat. 2011 Dec;5(4):2630-2650. doi: 10.1214/11-AOAS494.

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis.

Comput Stat Data Anal. 2012 Feb 1;56(2):245-254. doi: 10.1016/j.csda.2011.07.012.

Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis.

IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):194-200. doi: 10.1109/TPAMI.2010.160.

Multivariate association and dimension reduction: a generalization of canonical correlation analysis.

Biometrics. 2010 Dec;66(4):1107-18. doi: 10.1111/j.1541-0420.2010.01396.x.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

稳健稀疏典型相关分析

Robust sparse canonical correlation analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献