Chakraborty Antik, Ou Rihui, Dunson David B
Department of Statistics, Purdue University.
Department of Statistical Science, Duke University.
J Am Stat Assoc. 2024;119(548):2560-2571. doi: 10.1080/01621459.2023.2260053. Epub 2023 Nov 9.
It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form. Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. Our main focus is in accommodating high-dimensional binary response data with a small-to-moderate number of covariates. We propose a two-stage approach for inference on model parameters while taking care of uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.
收集高维二元响应数据变得越来越普遍;例如,随着生态学中新采样技术的出现。在较小维度中,多变量概率单位(MVP)模型通常用于进行推断。然而,由于似然性难以处理,涉及对没有解析形式的多元正态分布进行积分,用于拟合此类模型的算法在扩展到高维度时面临问题。尽管已经提出了各种算法来近似这个难以处理的积分,但这些方法在高维度中难以实现和/或不准确。我们的主要重点是处理具有少量到中等数量协变量的高维二元响应数据。我们提出了一种两阶段方法来推断模型参数,同时处理阶段之间的不确定性传播。我们利用潜在高斯模型的特殊结构来减少联合参数估计中涉及的高成本计算,将推断重点放在模型参数的边际分布上。这本质上使该方法在两个阶段都易于并行处理。我们在模拟和生态学中联合物种分布建模的应用中展示了性能。