Suppr超能文献

矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.

作者信息

Poythress J C, Park Cheolwoo, Ahn Jeongyoun

机构信息

Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, USA.

Department of Mathematical Sciences, KAIST, Daejeon, The Republic of Korea.

出版信息

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

Abstract

Many research proposals involve collecting multiple sources of information from a set of common samples, with the goal of performing an integrative analysis describing the associations between sources. We propose a method that characterizes the dominant modes of co-variation between the variables in two datasets while simultaneously performing variable selection. Our method relies on a sparse, low rank approximation of a matrix containing pairwise measures of association between the two sets of variables. We show that the proposed method shares a close connection with another group of methods for integrative data analysis - sparse canonical correlation analysis (CCA). Under some assumptions, the proposed method and sparse CCA aim to select the same subsets of variables. We show through simulation that the proposed method can achieve better variable selection accuracies than two state-of-the-art sparse CCA algorithms. Empirically, we demonstrate through the analysis of DNA methylation and gene expression data that the proposed method selects variables that have as high or higher canonical correlation than the variables selected by sparse CCA methods, which is a rather surprising finding given that objective function of the proposed method does not actually maximize the canonical correlation.

摘要

许多研究提案涉及从一组共同样本中收集多种信息来源,目的是进行综合分析以描述各来源之间的关联。我们提出了一种方法,该方法在对两个数据集中的变量间的共变主导模式进行表征的同时执行变量选择。我们的方法依赖于一个矩阵的稀疏、低秩近似,该矩阵包含两组变量之间的成对关联度量。我们表明,所提出的方法与另一组用于综合数据分析的方法——稀疏典型相关分析(CCA)有着密切联系。在某些假设下,所提出的方法和稀疏CCA旨在选择相同的变量子集。我们通过模拟表明,所提出的方法比两种最先进的稀疏CCA算法能实现更好的变量选择精度。从经验上看,我们通过对DNA甲基化和基因表达数据的分析证明,所提出的方法选择的变量具有与稀疏CCA方法选择的变量相同或更高的典型相关性,鉴于所提出方法的目标函数实际上并未最大化典型相关性,这是一个相当令人惊讶的发现。

相似文献

3
Robust sparse canonical correlation analysis.稳健稀疏典型相关分析
BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.
10
Canonical Correlation Analysis With Low-Rank Learning for Image Representation.基于低秩学习的图像表示典型相关分析
IEEE Trans Image Process. 2022;31:7048-7062. doi: 10.1109/TIP.2022.3219235. Epub 2022 Nov 14.

本文引用的文献

1
Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
2
Linked matrix factorization.链接矩阵分解
Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.
6
Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.多区块数据的组成分分析:共同和个体特征提取。
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.
8
10
Regularized matrix regression.正则化矩阵回归
J R Stat Soc Series B Stat Methodol. 2014 Mar 1;76(2):463-483. doi: 10.1111/rssb.12031.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验