Suppr超能文献

多组增强降秩回归分析泛癌数据。

Multiple augmented reduced rank regression for pan-cancer analysis.

机构信息

Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55414, United States.

出版信息

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad002.

Abstract

Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (ie, cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variations. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multimatrix factorization approaches, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variations. We apply maRRR to gene expression data from multiple cancer types (ie, pan-cancer) from The Cancer Genome Atlas, with somatic mutations as covariates. The method performs well with respect to prediction and imputation of held-out data, and provides new insights into mutation-driven and auxiliary variations that are shared or specific to certain cancer types.

摘要

与单独的分析相比,成功结合多个数据集的统计方法更加强大、高效和具有科学信息量。为了正确全面地解决多个样本集(即队列)中高维数据的变化结构,我们提出了多增强降秩回归(maRRR),这是一种灵活的矩阵回归和分解方法,可以同时学习协变量驱动和辅助结构变化。我们考虑了一种基于随机矩阵理论的结构化核范数目标,其中回归或分解项可以共享或特定于任意数量的队列。我们的框架包含了几种现有方法,例如降秩回归和无监督多矩阵分解方法,并包括一种有前途的针对单个数据集的回归和分解的新方法(aRRR)作为特例。模拟结果表明,从多个数据集的组合中以及从所有结构变化的简约考虑中可以获得实质性的功效提升。我们将 maRRR 应用于来自癌症基因组图谱的多个癌症类型(即泛癌)的基因表达数据,并将体细胞突变作为协变量。该方法在保留数据的预测和插补方面表现良好,并提供了有关突变驱动和辅助变化的新见解,这些变化是共享的或特定于某些癌症类型的。

相似文献

1
Multiple augmented reduced rank regression for pan-cancer analysis.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad002.
3
BIDIMENSIONAL LINKED MATRIX FACTORIZATION FOR PAN-OMICS PAN-CANCER ANALYSIS.
Ann Appl Stat. 2022 Mar;16(1):193-215. doi: 10.1214/21-AOAS1495. Epub 2022 Mar 28.
4
Empirical Bayes Linked Matrix Decomposition.
Mach Learn. 2024 Oct;113(10):7451-7477. doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.
5
Integrative factorization of bidimensionally linked matrices.
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
6
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.
Comput Stat Data Anal. 2024 Sep;197. doi: 10.1016/j.csda.2024.107974. Epub 2024 Apr 30.
7
Covariate-driven factorization by thresholding for multiblock data.
Biometrics. 2021 Sep;77(3):1011-1023. doi: 10.1111/biom.13352. Epub 2020 Aug 25.
8
Logarithmic Norm Regularized Low-Rank Factorization for Matrix and Tensor Completion.
IEEE Trans Image Process. 2021;30:3434-3449. doi: 10.1109/TIP.2021.3061908. Epub 2021 Mar 9.
9
Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications.
Neuroimage. 2017 Jan 1;144(Pt A):35-57. doi: 10.1016/j.neuroimage.2016.08.027. Epub 2016 Sep 22.
10
Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications.
IEEE Trans Pattern Anal Mach Intell. 2020 Jun;42(6):1468-1482. doi: 10.1109/TPAMI.2019.2900306. Epub 2019 Feb 19.

本文引用的文献

2
BIDIMENSIONAL LINKED MATRIX FACTORIZATION FOR PAN-OMICS PAN-CANCER ANALYSIS.
Ann Appl Stat. 2022 Mar;16(1):193-215. doi: 10.1214/21-AOAS1495. Epub 2022 Mar 28.
3
Joint association and classification analysis of multi-view data.
Biometrics. 2022 Dec;78(4):1614-1625. doi: 10.1111/biom.13536. Epub 2021 Aug 22.
5
Structural learning and integrative decomposition of multi-view data.
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
6
Integrative multi-view regression: Bridging group-sparse and low-rank models.
Biometrics. 2019 Jun;75(2):593-602. doi: 10.1111/biom.13006. Epub 2019 Mar 29.
7
TP53 gain-of-function mutation promotes inflammation in glioblastoma.
Cell Death Differ. 2019 Mar;26(3):409-425. doi: 10.1038/s41418-018-0126-3. Epub 2018 May 21.
8
9
The Cancer Genome Atlas: Creating Lasting Value beyond Its Data.
Cell. 2018 Apr 5;173(2):283-285. doi: 10.1016/j.cell.2018.03.042.
10
Reduced rank regression via adaptive nuclear norm penalization.
Biometrika. 2013 Dec 4;100(4):901-920. doi: 10.1093/biomet/ast036.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验