Suppr超能文献

经验贝叶斯链接矩阵分解

Empirical Bayes Linked Matrix Decomposition.

作者信息

Lock Eric F

机构信息

Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, 55455, MN, USA.

出版信息

Mach Learn. 2024 Oct;113(10):7451-7477. doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.

Abstract

Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular "omics" technologies may capture different feature sets (e.g., corresponding to rows in a matrix) and/or different sample populations (corresponding to columns). This has motivated a large body of work on integrative matrix factorization approaches that identify and decompose low-dimensional signal that is shared across multiple matrices or specific to a given matrix. We propose an empirical variational Bayesian approach to this problem that has several advantages over existing techniques, including the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for "blockwise" imputation (in which an entire row or column is missing) in various linked matrix contexts. Extensive simulations show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal, accurately decomposing shared and specific signals, and accurately imputing missing data. The approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation.

摘要

不同领域中多个应用的数据可以表示为跨行或跨列链接的多个矩阵。这在分子生物医学研究中尤为常见,其中多种分子“组学”技术可能会捕获不同的特征集(例如,对应于矩阵中的行)和/或不同的样本群体(对应于列)。这推动了大量关于整合矩阵分解方法的研究工作,这些方法用于识别和分解跨多个矩阵共享或特定于给定矩阵的低维信号。我们针对此问题提出了一种经验变分贝叶斯方法,该方法相对于现有技术具有多个优点,包括能够灵活适应任意数量的行集或列集上的共享信号(即二维整合)、基于直观模型的目标函数,该函数能对推断信号产生适当的收缩,以及一种无需调整参数的相对高效的估计算法。一个一般性结果为包括所提出方法在内的一大类方法的潜在分解唯一性建立了条件。对于存在缺失数据的情况,我们描述了一种相关的迭代插补方法,该方法在单矩阵情况下是新颖的,并且在各种链接矩阵情况下是用于“逐块”插补(其中整行或整列缺失) 的强大方法。广泛的模拟表明,该方法在恢复潜在低秩信号、准确分解共享和特定信号以及准确插补缺失数据方面,在不同场景下表现都非常出色。该方法应用于来自乳腺癌组织和正常乳腺组织的基因表达和miRNA数据,它对变异进行了有信息价值的分解,并且在缺失数据插补方面优于替代策略。

相似文献

1
Empirical Bayes Linked Matrix Decomposition.经验贝叶斯链接矩阵分解
Mach Learn. 2024 Oct;113(10):7451-7477. doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.
3
Linked matrix factorization.链接矩阵分解
Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.
4
Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
7
9
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.使用多组学数据的贝叶斯同时分解与预测
Comput Stat Data Anal. 2024 Sep;197. doi: 10.1016/j.csda.2024.107974. Epub 2024 Apr 30.

本文引用的文献

1
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.使用多组学数据的贝叶斯同时分解与预测
Comput Stat Data Anal. 2024 Sep;197. doi: 10.1016/j.csda.2024.107974. Epub 2024 Apr 30.
3
Hierarchical nuclear norm penalization for multi-view data integration.层次核范数惩罚多视图数据集成。
Biometrics. 2023 Dec;79(4):2933-2946. doi: 10.1111/biom.13893. Epub 2023 Jun 22.
5
Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
6
Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
7
Structured Matrix Completion with Applications to Genomic Data Integration.结构化矩阵补全及其在基因组数据整合中的应用
J Am Stat Assoc. 2016;111(514):621-633. doi: 10.1080/01621459.2015.1021005. Epub 2016 Aug 18.
8
R.JIVE for exploration of multi-source molecular data.用于多源分子数据探索的R.JIVE
Bioinformatics. 2016 Sep 15;32(18):2877-9. doi: 10.1093/bioinformatics/btw324. Epub 2016 Jun 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验