Suppr超能文献

用于泛癌分析的多重增强降秩回归

Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis.

作者信息

Wang Jiuzhou, Lock Eric F

出版信息

ArXiv. 2023 Aug 30:arXiv:2308.16333v1.

Abstract

Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variation. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multi-matrix factorization approaches, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variation. We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA, with somatic mutations as covariates. The method performs well with respect to prediction and imputation of held-out data, and provides new insights into mutation-driven and auxiliary variation that is shared or specific to certain cancer types.

摘要

与单独分析相比,成功整合多个数据集的统计方法更强大、高效且具有科学信息价值。为了正确且全面地处理跨多个样本集(即队列)的高维数据的变异结构,我们提出了多重增强降秩回归(maRRR),这是一种灵活的矩阵回归和分解方法,可同时学习协变量驱动的变异和辅助结构化变异。我们考虑了一个由随机矩阵理论激发的结构化核范数目标,其中回归或分解项可以在任意数量的队列中共享或特定于某个队列。我们的框架包含了几种现有方法,如降秩回归和无监督多矩阵分解方法,并将一种有前景的单数据集回归和分解新方法(aRRR)作为特殊情况包含在内。模拟结果表明,整合多个数据集以及简约地考虑所有结构化变异能显著提高功效。我们将maRRR应用于来自TCGA的多种癌症类型(即泛癌)的基因表达数据,并将体细胞突变作为协变量。该方法在对留出数据的预测和插补方面表现良好,并为某些癌症类型共享或特定的突变驱动变异和辅助变异提供了新的见解。

相似文献

4
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.使用多组学数据的贝叶斯同时分解与预测
Comput Stat Data Anal. 2024 Sep;197. doi: 10.1016/j.csda.2024.107974. Epub 2024 Apr 30.
5
Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications.结构化低秩矩阵分解:全局最优性、算法及应用
IEEE Trans Pattern Anal Mach Intell. 2020 Jun;42(6):1468-1482. doi: 10.1109/TPAMI.2019.2900306. Epub 2019 Feb 19.
7
Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
9
Pan-cancer analysis of differential DNA methylation patterns.泛癌症分析中差异 DNA 甲基化模式。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):154. doi: 10.1186/s12920-020-00780-3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验