Suppr超能文献

潜在相关性的快速计算。

Fast computation of latent correlations.

作者信息

Yoon Grace, Müller Christian L, Gaynanova Irina

机构信息

Department of Statistics, Texas A&M University, College Station, TX.

Center for Computational Mathematics, Flatiron Institute, New York, NY; Department of Statistics, LMU München, Munich, Germany; Institute of Computational Biology, Helmholtz Zentrum Munchen, Germany.

出版信息

J Comput Graph Stat. 2021;30(4):1249-1256. doi: 10.1080/10618600.2021.1882468. Epub 2021 Mar 29.

Abstract

Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having prevented the routine use of these models on high-dimensional data. Here, we propose a new computational approach for estimating latent correlations via a hybrid multilinear interpolation and optimization scheme. Our approach speeds up the current state of the art computation by several orders of magnitude, thus allowing fast computation of latent Gaussian copula models even when the number of variables is large. We provide theoretical guarantees for the approximation error of our numerical scheme and support its excellent performance on simulated and real-world data. We illustrate the practical advantages of our method on high-dimensional sparse quantitative and relative abundance microbiome data as well as multi-view data from The Cancer Genome Atlas Project. Our method is implemented in the R package mixedCCA, available at https://github.com/irinagain/mixedCCA.

摘要

潜在高斯 copula 模型提供了一种强大的方法来进行多视图数据集成,因为这些模型可以通过潜在相关性无缝地表达混合变量类型(二元、连续、零膨胀)之间的依赖关系。然而,这些潜在相关性的估计需要相当大的计算成本,这使得这些模型无法在高维数据上常规使用。在这里,我们提出了一种新的计算方法,通过混合多线性插值和优化方案来估计潜在相关性。我们的方法将当前的先进计算速度提高了几个数量级,从而即使在变量数量很大时也能快速计算潜在高斯 copula 模型。我们为我们的数值方案的近似误差提供了理论保证,并支持其在模拟数据和真实世界数据上的优异性能。我们在高维稀疏定量和相对丰度微生物组数据以及来自癌症基因组图谱项目的多视图数据上说明了我们方法的实际优势。我们的方法在 R 包 mixedCCA 中实现,可在 https://github.com/irinagain/mixedCCA 上获取。

相似文献

1
Fast computation of latent correlations.潜在相关性的快速计算。
J Comput Graph Stat. 2021;30(4):1249-1256. doi: 10.1080/10618600.2021.1882468. Epub 2021 Mar 29.
3
Bayesian Variable Selection for Gaussian copula regression models.高斯Copula回归模型的贝叶斯变量选择
J Comput Graph Stat. 2020 Dec 10;30(3):578-593. doi: 10.1080/10618600.2020.1840997.
6
Bayesian Gaussian Copula Factor Models for Mixed Data.用于混合数据的贝叶斯高斯Copula因子模型
J Am Stat Assoc. 2013 Jun 1;108(502):656-665. doi: 10.1080/01621459.2012.762328.
7
Canonical correlation analysis for elliptical copulas.椭圆型Copula的典型相关分析
J Multivar Anal. 2021 May;183. doi: 10.1016/j.jmva.2020.104715. Epub 2020 Nov 23.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验