用于混合数据的贝叶斯高斯Copula因子模型

Bayesian Gaussian Copula Factor Models for Mixed Data.

作者信息

Murray Jared S, Dunson David B, Carin Lawrence, Lucas Joseph E

机构信息

Dept. of Statistical Science, Duke University, Durham, NC 27708 (

出版信息

J Am Stat Assoc. 2013 Jun 1;108(502):656-665. doi: 10.1080/01621459.2012.762328.

DOI:10.1080/01621459.2012.762328

PMID:23990691

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3753118/

Abstract

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

摘要

高斯因子模型已被证明在简洁地刻画多元数据中的相关性方面非常有用。关于将其扩展到混合分类变量和连续变量，有大量文献，这些文献使用潜在高斯变量或通过广义潜在特质模型来处理指数族中的测量值。然而，当推广到非高斯测量变量时，潜在变量通常会同时影响依赖结构和边际分布的形式，这使得解释变得复杂并引入了人为因素。为了解决这个问题，我们提出了一类新颖的贝叶斯高斯copula因子模型，该模型将潜在因子与边际分布解耦。基于扩展秩似然的边际半参数规范实现起来很直接，并且在计算上有很大的优势。我们为在贝叶斯推断中使用这种似然提供了新的理论和实证依据。我们为因子载荷提出了新的默认先验，并开发了用于后验计算的高效参数扩展吉布斯抽样。通过模拟对这些方法进行了评估，并将其应用于一个政治学数据集。本文中的模型在R包bfa中实现。

相似文献

Bayesian Gaussian Copula Factor Models for Mixed Data.用于混合数据的贝叶斯高斯Copula因子模型

J Am Stat Assoc. 2013 Jun 1;108(502):656-665. doi: 10.1080/01621459.2012.762328.

Information bounds for Gaussian copulas.高斯Copula的信息界。

Bernoulli (Andover). 2014;20(2):604-622. doi: 10.3150/12-BEJ499.

Bayesian bivariate survival analysis using the power variance function copula.使用幂方差函数Copula的贝叶斯双变量生存分析。

Lifetime Data Anal. 2018 Apr;24(2):355-383. doi: 10.1007/s10985-017-9396-1. Epub 2017 May 23.

Simplex Factor Models for Multivariate Unordered Categorical Data.多元无序分类数据的单纯形因子模型

J Am Stat Assoc. 2012 Mar 1;107(497):362-377. doi: 10.1080/01621459.2011.646934.

Bayesian Variable Selection for Gaussian copula regression models.高斯Copula回归模型的贝叶斯变量选择

J Comput Graph Stat. 2020 Dec 10;30(3):578-593. doi: 10.1080/10618600.2020.1840997.

Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis.贝叶斯因子分析中的默认先验分布与高效后验计算

J Comput Graph Stat. 2009 Jun 1;18(2):306-320. doi: 10.1198/jcgs.2009.07145.

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies.使用贝叶斯潜在高斯图形模型推断口头尸检中的症状关联。

Bayesian Anal. 2020 Sep;15(3):781-807. doi: 10.1214/19-ba1172. Epub 2019 Sep 24.

Fast computation of latent correlations.潜在相关性的快速计算。

J Comput Graph Stat. 2021;30(4):1249-1256. doi: 10.1080/10618600.2021.1882468. Epub 2021 Mar 29.

Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies.非参数高斯过程先验的变量选择：模型与计算策略

Stat Sci. 2011 Feb 1;26(1):130-149. doi: 10.1214/11-STS354.

Bayesian Inference for High Dimensional Cox Models with Gaussian and Diffused-Gamma Priors: A Case Study of Mortality in COVID-19 Patients Admitted to the ICU.具有高斯和扩散伽马先验的高维考克斯模型的贝叶斯推断：以入住重症监护病房的COVID-19患者死亡率为例的研究

Stat Biosci. 2024 Apr;16(1):221-249. doi: 10.1007/s12561-023-09395-5. Epub 2023 Nov 4.

引用本文的文献

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis.通过子空间因子分析从多个数据源推断协方差结构。

J Am Stat Assoc. 2025 Jun;120(550):1239-1253. doi: 10.1080/01621459.2024.2408777. Epub 2024 Dec 5.

LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES.低秩纵向因子回归及其在化学混合物中的应用

Ann Appl Stat. 2025 Mar;19(1):769-797. doi: 10.1214/24-aoas1988. Epub 2025 Mar 17.

Bayesian Variable Selection for Gaussian copula regression models.高斯Copula回归模型的贝叶斯变量选择

J Comput Graph Stat. 2020 Dec 10;30(3):578-593. doi: 10.1080/10618600.2020.1840997.

Generalized infinite factorization models.广义无限分解模型。

Biometrika. 2022 Sep;109(3):817-835. doi: 10.1093/biomet/asab056. Epub 2022 Jan 19.

Fast Moment Estimation for Generalized Latent Dirichlet Models.广义潜在狄利克雷模型的快速矩估计

J Am Stat Assoc. 2018;113(524):1528-1540. doi: 10.1080/01621459.2017.1341839. Epub 2018 Nov 13.

Bayesian Factor Analysis for Inference on Interactions.用于交互作用推断的贝叶斯因子分析

J Am Stat Assoc. 2021;116(535):1521-1532. doi: 10.1080/01621459.2020.1745813. Epub 2020 Apr 20.

A statistical model for describing and simulating microbial community profiles.用于描述和模拟微生物群落分布的统计模型。

PLoS Comput Biol. 2021 Sep 13;17(9):e1008913. doi: 10.1371/journal.pcbi.1008913. eCollection 2021 Sep.

Nonparametric graphical model for counts.用于计数的非参数图形模型。

J Mach Learn Res. 2020 Dec;21.

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies.使用贝叶斯潜在高斯图形模型推断口头尸检中的症状关联。

Bayesian Anal. 2020 Sep;15(3):781-807. doi: 10.1214/19-ba1172. Epub 2019 Sep 24.

A Bayesian semiparametric factor analysis model for subtype identification.用于亚型识别的贝叶斯半参数因子分析模型

Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):145-158. doi: 10.1515/sagmb-2016-0051.

本文引用的文献

Information bounds for Gaussian copulas.高斯Copula的信息界。

Bernoulli (Andover). 2014;20(2):604-622. doi: 10.3150/12-BEJ499.

GENERALIZED DOUBLE PARETO SHRINKAGE.广义双帕累托收缩

Stat Sin. 2013 Jan 1;23(1):119-143.

Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis.贝叶斯因子分析中的默认先验分布与高效后验计算

J Comput Graph Stat. 2009 Jun 1;18(2):306-320. doi: 10.1198/jcgs.2009.07145.

Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.使用因子分析器的非参数混合在流形上的压缩感知：算法与性能界限

IEEE Trans Signal Process. 2010 Dec;58(12):6140-6155. doi: 10.1109/TSP.2010.2070796.

Sparse Bayesian infinite factor models.稀疏贝叶斯无限因子模型

Biometrika. 2011 Jun;98(2):291-306. doi: 10.1093/biomet/asr013.

High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.高维稀疏因子建模：在基因表达基因组学中的应用

J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. doi: 10.1198/016214508000000869.

A semiparametric Bayesian approach for structural equation models.一种用于结构方程模型的半参数贝叶斯方法。

Biom J. 2010 Jun;52(3):314-32. doi: 10.1002/bimj.200900135.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。