Suppr超能文献

二维关联矩阵的综合分解。

Integrative factorization of bidimensionally linked matrices.

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota.

出版信息

Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.

Abstract

Advances in molecular "omics" technologies have motivated new methodologies for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose bidimensional integrative factorization (BIDIFAC) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes data into (a) globally shared, (b) row-shared, (c) column-shared, and (d) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation, we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from the random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (messenger RNA and microRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from the Cancer Genome Atlas. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.

摘要

分子“组学”技术的进步推动了整合多源高内涵生物医学数据的新方法的发展。然而,大多数整合多个数据矩阵的统计方法仅考虑垂直方向(一个队列在多个平台上)或水平方向(单个平台上的不同队列)共享的数据。对于采用二维链接矩阵形式的数据(例如,在多个平台上测量的多个队列),这是有限的,这种数据在大型生物医学研究中越来越常见。在本文中,我们提出了二维综合因子分析(BIDIFAC),用于二维链接数据矩阵的综合降维和信号逼近。我们的方法将数据分解为(a)全局共享、(b)行共享、(c)列共享和(d)单个矩阵结构组件,便于研究共享和独特的变异性模式。对于估计,我们使用扩展了单个矩阵的核范数惩罚的惩罚目标函数。作为复杂的秩选择问题的替代方案,我们使用随机矩阵理论的结果来选择调整参数。我们使用来自癌症基因组图谱的乳腺癌数据,将两个基因组学平台(信使 RNA 和 microRNA 表达)整合到两个样本队列(肿瘤样本和正常组织样本)中,并应用我们的方法。我们提供了用于拟合 BIDIFAC、插补缺失值和生成模拟数据的 R 代码。

相似文献

1
Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
4
Linked matrix factorization.链接矩阵分解
Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.

引用本文的文献

3
Empirical Bayes Linked Matrix Decomposition.经验贝叶斯链接矩阵分解
Mach Learn. 2024 Oct;113(10):7451-7477. doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.
5
Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.使用多组学数据的贝叶斯同时分解与预测
Comput Stat Data Anal. 2024 Sep;197. doi: 10.1016/j.csda.2024.107974. Epub 2024 Apr 30.
9
Bayesian Distance Weighted Discrimination.贝叶斯距离加权判别法
J Comput Graph Stat. 2022;31(4):1177-1188. doi: 10.1080/10618600.2022.2069778. Epub 2022 May 26.

本文引用的文献

1
Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
2
Linked matrix factorization.链接矩阵分解
Biometrics. 2019 Jun;75(2):582-592. doi: 10.1111/biom.13010. Epub 2019 Apr 2.
6
Prediction With Dimension Reduction of Multiple Molecular Data Sources for Patient Survival.利用多分子数据源降维预测患者生存率
Cancer Inform. 2017 Jul 11;16:1176935117718517. doi: 10.1177/1176935117718517. eCollection 2017.
8
R.JIVE for exploration of multi-source molecular data.用于多源分子数据探索的R.JIVE
Bioinformatics. 2016 Sep 15;32(18):2877-9. doi: 10.1093/bioinformatics/btw324. Epub 2016 Jun 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验