Southwest Biological Science Center, U.S. Geological Survey, 2255 North Gemini Drive, Flagstaff, Arizona, 86001, USA.
USDA Forest Service, Rocky Mountain Research Station, Flagstaff, Arizona, 86001, USA.
Ecol Appl. 2020 Jul;30(5):e02112. doi: 10.1002/eap.2112. Epub 2020 Apr 1.
Bayesian population models can be exceedingly slow due, in part, to the choice to simulate discrete latent states. Here, we discuss an alternative approach to discrete latent states, marginalization, that forms the basis of maximum likelihood population models and is much faster. Our manuscript has two goals: (1) to introduce readers unfamiliar with marginalization to the concept and provide worked examples and (2) to address topics associated with marginalization that have not been previously synthesized and are relevant to both Bayesian and maximum likelihood models. We begin by explaining marginalization using a Cormack-Jolly-Seber model. Next, we apply marginalization to multistate capture-recapture, community occupancy, and integrated population models and briefly discuss random effects, priors, and pseudo-R . Then, we focus on recovery of discrete latent states, defining different types of conditional probabilities and showing how quantities such as population abundance or species richness can be estimated in marginalized code. Last, we show that occupancy and site-abundance models with auto-covariates can be fit with marginalized code with minimal impact on parameter estimates. Marginalized code was anywhere from five to >1,000 times faster than discrete code and differences in inferences were minimal. Discrete latent states and fully conditional approaches provide the best estimates of conditional probabilities for a given site or individual. However, estimates for parameters and derived quantities such as species richness and abundance are minimally affected by marginalization. In the case of abundance, marginalized code is both quicker and has lower bias than an N-augmentation approach. Understanding how marginalization works shrinks the divide between Bayesian and maximum likelihood approaches to population models. Some models that have only been presented in a Bayesian framework can easily be fit in maximum likelihood. On the other hand, factors such as informative priors, random effects, or pseudo-R values may motivate a Bayesian approach in some applications. An understanding of marginalization allows users to minimize the speed that is sacrificed when switching from a maximum likelihood approach. Widespread application of marginalization in Bayesian population models will facilitate more thorough simulation studies, comparisons of alternative model structures, and faster learning.
贝叶斯群体模型可能非常缓慢,部分原因是选择模拟离散潜在状态。在这里,我们讨论离散潜在状态的替代方法——边缘化,它是最大似然群体模型的基础,并且速度要快得多。我们的手稿有两个目标:(1)向不熟悉边缘化的读者介绍该概念,并提供实例和(2)解决以前未综合且与贝叶斯和最大似然模型都相关的边缘化相关主题。我们首先使用 Cormack-Jolly-Seber 模型解释边缘化。接下来,我们将边缘化应用于多状态捕获-再捕获、群落占据和综合种群模型,并简要讨论随机效应、先验和伪 R。然后,我们专注于离散潜在状态的恢复,定义不同类型的条件概率,并展示如何在边缘化代码中估计种群数量或物种丰富度等数量。最后,我们表明具有自协变量的占据和位点丰度模型可以用边缘化代码拟合,对参数估计的影响最小。边缘化代码的速度比离散代码快 5 到 1000 倍以上,并且推断结果的差异很小。离散潜在状态和完全条件方法为特定地点或个体提供了条件概率的最佳估计。但是,参数和派生数量(如物种丰富度和丰度)的估计受边缘化的影响最小。在丰度的情况下,边缘化代码比 N 扩充方法更快且偏差更小。理解边缘化的工作原理缩小了贝叶斯和最大似然方法在群体模型中的差距。一些仅在贝叶斯框架中提出的模型可以很容易地在最大似然框架中拟合。另一方面,在某些应用中,信息先验、随机效应或伪 R 值等因素可能会促使采用贝叶斯方法。对边缘化的理解使用户能够在从最大似然方法切换时最小化牺牲的速度。边缘化在贝叶斯群体模型中的广泛应用将促进更彻底的模拟研究、替代模型结构的比较以及更快的学习。