矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.

作者信息

Poythress J C, Park Cheolwoo, Ahn Jeongyoun

机构信息

Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, USA.

Department of Mathematical Sciences, KAIST, Daejeon, The Republic of Korea.

出版信息

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

DOI:10.1080/02664763.2021.1967892

PMID:36324486

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9621263/

Abstract

Many research proposals involve collecting multiple sources of information from a set of common samples, with the goal of performing an integrative analysis describing the associations between sources. We propose a method that characterizes the dominant modes of co-variation between the variables in two datasets while simultaneously performing variable selection. Our method relies on a sparse, low rank approximation of a matrix containing pairwise measures of association between the two sets of variables. We show that the proposed method shares a close connection with another group of methods for integrative data analysis - sparse canonical correlation analysis (CCA). Under some assumptions, the proposed method and sparse CCA aim to select the same subsets of variables. We show through simulation that the proposed method can achieve better variable selection accuracies than two state-of-the-art sparse CCA algorithms. Empirically, we demonstrate through the analysis of DNA methylation and gene expression data that the proposed method selects variables that have as high or higher canonical correlation than the variables selected by sparse CCA methods, which is a rather surprising finding given that objective function of the proposed method does not actually maximize the canonical correlation.

摘要

许多研究提案涉及从一组共同样本中收集多种信息来源，目的是进行综合分析以描述各来源之间的关联。我们提出了一种方法，该方法在对两个数据集中的变量间的共变主导模式进行表征的同时执行变量选择。我们的方法依赖于一个矩阵的稀疏、低秩近似，该矩阵包含两组变量之间的成对关联度量。我们表明，所提出的方法与另一组用于综合数据分析的方法——稀疏典型相关分析（CCA）有着密切联系。在某些假设下，所提出的方法和稀疏CCA旨在选择相同的变量子集。我们通过模拟表明，所提出的方法比两种最先进的稀疏CCA算法能实现更好的变量选择精度。从经验上看，我们通过对DNA甲基化和基因表达数据的分析证明，所提出的方法选择的变量具有与稀疏CCA方法选择的变量相同或更高的典型相关性，鉴于所提出方法的目标函数实际上并未最大化典型相关性，这是一个相当令人惊讶的发现。

相似文献

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

Robust sparse canonical correlation analysis.稳健稀疏典型相关分析

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data.用于发现高维数据中非线性相互作用的稀疏核典型相关分析。

BMC Bioinformatics. 2017 Feb 14;18(1):108. doi: 10.1186/s12859-017-1543-x.

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

Extensions of sparse canonical correlation analysis with applications to genomic data.稀疏典型相关分析的扩展及其在基因组数据中的应用

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

Sparse multiway canonical correlation analysis for multimodal stroke recovery data.稀疏多向典范相关分析在多模态中风康复数据中的应用。

Biom J. 2024 Mar;66(2):e2300037. doi: 10.1002/bimj.202300037.

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.基于典型相关分析的基因表达和拷贝数改变的综合分析。

BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.

Performing Sparse Regularization and Dimension Reduction Simultaneously in Multimodal Data Fusion.在多模态数据融合中同时进行稀疏正则化和降维

Front Neurosci. 2019 Jul 3;13:642. doi: 10.3389/fnins.2019.00642. eCollection 2019.

Canonical Correlation Analysis With Low-Rank Learning for Image Representation.基于低秩学习的图像表示典型相关分析

IEEE Trans Image Process. 2022;31:7048-7062. doi: 10.1109/TIP.2022.3219235. Epub 2022 Nov 14.

本文引用的文献

Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解

Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.

Linked matrix factorization.链接矩阵分解

Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.

Quantifying heterogeneity of expression data based on principal components.基于主成分对表达数据的异质性进行量化。

Bioinformatics. 2019 Feb 15;35(4):553-559. doi: 10.1093/bioinformatics/bty671.

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data.稀疏广义特征值问题及其在甲基化与基因表达数据综合分析的典型相关分析中的应用

Biometrics. 2018 Dec;74(4):1362-1371. doi: 10.1111/biom.12886. Epub 2018 May 11.

Susceptibility of brain atrophy to in Alzheimer's disease, evidence from functional prioritization in imaging genetics.脑萎缩对阿尔茨海默病的易感性，影像学遗传学中功能优先化的证据。

Proc Natl Acad Sci U S A. 2018 Mar 20;115(12):3162-3167. doi: 10.1073/pnas.1706100115. Epub 2018 Mar 6.

Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.多区块数据的组成分分析：共同和个体特征提取。

IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.

A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data.一种用于在异质组学多模态数据中检测模块的非负矩阵分解方法。

Bioinformatics. 2016 Jan 1;32(1):1-8. doi: 10.1093/bioinformatics/btv544. Epub 2015 Sep 15.

Sparse canonical correlation analysis from a predictive point of view.从预测角度看稀疏典型相关分析。

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Development and characterization of a Yucatan miniature biomedical pig permanent middle cerebral artery occlusion stroke model.尤卡坦小型生物医学猪永久性大脑中动脉闭塞性中风模型的建立与特性研究

Exp Transl Stroke Med. 2014 Mar 23;6(1):5. doi: 10.1186/2040-7378-6-5.

Regularized matrix regression.正则化矩阵回归

J R Stat Soc Series B Stat Methodol. 2014 Mar 1;76(2):463-483. doi: 10.1111/rssb.12031.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验