Suppr超能文献

矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.

作者信息

Poythress J C, Park Cheolwoo, Ahn Jeongyoun

机构信息

Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, USA.

Department of Mathematical Sciences, KAIST, Daejeon, The Republic of Korea.

出版信息

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

Abstract

Many research proposals involve collecting multiple sources of information from a set of common samples, with the goal of performing an integrative analysis describing the associations between sources. We propose a method that characterizes the dominant modes of co-variation between the variables in two datasets while simultaneously performing variable selection. Our method relies on a sparse, low rank approximation of a matrix containing pairwise measures of association between the two sets of variables. We show that the proposed method shares a close connection with another group of methods for integrative data analysis - sparse canonical correlation analysis (CCA). Under some assumptions, the proposed method and sparse CCA aim to select the same subsets of variables. We show through simulation that the proposed method can achieve better variable selection accuracies than two state-of-the-art sparse CCA algorithms. Empirically, we demonstrate through the analysis of DNA methylation and gene expression data that the proposed method selects variables that have as high or higher canonical correlation than the variables selected by sparse CCA methods, which is a rather surprising finding given that objective function of the proposed method does not actually maximize the canonical correlation.

摘要

许多研究提案涉及从一组共同样本中收集多种信息来源,目的是进行综合分析以描述各来源之间的关联。我们提出了一种方法,该方法在对两个数据集中的变量间的共变主导模式进行表征的同时执行变量选择。我们的方法依赖于一个矩阵的稀疏、低秩近似,该矩阵包含两组变量之间的成对关联度量。我们表明,所提出的方法与另一组用于综合数据分析的方法——稀疏典型相关分析(CCA)有着密切联系。在某些假设下,所提出的方法和稀疏CCA旨在选择相同的变量子集。我们通过模拟表明,所提出的方法比两种最先进的稀疏CCA算法能实现更好的变量选择精度。从经验上看,我们通过对DNA甲基化和基因表达数据的分析证明,所提出的方法选择的变量具有与稀疏CCA方法选择的变量相同或更高的典型相关性,鉴于所提出方法的目标函数实际上并未最大化典型相关性,这是一个相当令人惊讶的发现。

相似文献

2
Group sparse canonical correlation analysis for genomic data integration.
BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.
3
Robust sparse canonical correlation analysis.
BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.
4
Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data.
BMC Bioinformatics. 2017 Feb 14;18(1):108. doi: 10.1186/s12859-017-1543-x.
6
Extensions of sparse canonical correlation analysis with applications to genomic data.
Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.
7
Sparse multiway canonical correlation analysis for multimodal stroke recovery data.
Biom J. 2024 Mar;66(2):e2300037. doi: 10.1002/bimj.202300037.
8
Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.
BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.
9
Performing Sparse Regularization and Dimension Reduction Simultaneously in Multimodal Data Fusion.
Front Neurosci. 2019 Jul 3;13:642. doi: 10.3389/fnins.2019.00642. eCollection 2019.
10
Canonical Correlation Analysis With Low-Rank Learning for Image Representation.
IEEE Trans Image Process. 2022;31:7048-7062. doi: 10.1109/TIP.2022.3219235. Epub 2022 Nov 14.

本文引用的文献

1
Structural learning and integrative decomposition of multi-view data.
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
2
Linked matrix factorization.
Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.
3
Quantifying heterogeneity of expression data based on principal components.
Bioinformatics. 2019 Feb 15;35(4):553-559. doi: 10.1093/bioinformatics/bty671.
5
Susceptibility of brain atrophy to in Alzheimer's disease, evidence from functional prioritization in imaging genetics.
Proc Natl Acad Sci U S A. 2018 Mar 20;115(12):3162-3167. doi: 10.1073/pnas.1706100115. Epub 2018 Mar 6.
6
Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.
7
A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data.
Bioinformatics. 2016 Jan 1;32(1):1-8. doi: 10.1093/bioinformatics/btv544. Epub 2015 Sep 15.
8
Sparse canonical correlation analysis from a predictive point of view.
Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.
10
Regularized matrix regression.
J R Stat Soc Series B Stat Methodol. 2014 Mar 1;76(2):463-483. doi: 10.1111/rssb.12031.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验