Zhang W, Bravington M V, Fewster R M
Department of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand.
CSIRO Marine Lab, GPO Box 1538, Hobart, TAS, 7001, Australia.
Biometrics. 2019 Sep;75(3):723-733. doi: 10.1111/biom.13030. Epub 2019 Apr 4.
Latent count models constitute an important modeling class in which a latent vector of counts, , is summarized or corrupted for reporting, yielding observed data where is a known but non-invertible matrix. The observed vector generally follows an unknown multivariate distribution with a complicated dependence structure. Latent count models arise in diverse fields, such as estimation of population size from capture-recapture studies; inference on multi-way contingency tables summarized by marginal totals; or analysis of route flows in networks based on traffic counts at a subset of nodes. Currently, inference under these models relies primarily on stochastic algorithms for sampling the latent vector , typically in a Bayesian data-augmentation framework. These schemes involve long computation times and can be difficult to implement. Here, we present a novel maximum-likelihood approach using likelihoods constructed by the saddlepoint approximation. We show how the saddlepoint likelihood may be maximized efficiently, yielding fast inference even for large problems. For the case where has a multinomial distribution, we validate the approximation by applying it to a specific model for which an exact likelihood is available. We implement the method for several models of interest, and evaluate its performance empirically and by comparison with other estimation approaches. The saddlepoint method consistently gives fast and accurate inference, even when is dominated by small counts.
潜在计数模型构成了一类重要的建模方法,其中计数的潜在向量(\mathbf{z})被汇总或破坏以进行报告,从而产生观测数据(\mathbf{y}),其中(\mathbf{y} = \mathbf{A}\mathbf{z}),(\mathbf{A})是一个已知但不可逆的矩阵。观测向量(\mathbf{y})通常遵循具有复杂依赖结构的未知多元分布。潜在计数模型出现在各种领域,例如通过捕获 - 再捕获研究估计种群大小;对由边际总数汇总的多向列联表进行推断;或基于网络中节点子集的交通流量计数分析网络中的路线流量。目前,在这些模型下的推断主要依赖于用于对潜在向量(\mathbf{z})进行采样的随机算法,通常是在贝叶斯数据增强框架中。这些方案计算时间长且可能难以实现。在这里,我们提出了一种使用鞍点近似构建的似然函数的新颖最大似然方法。我们展示了如何有效地最大化鞍点似然函数,即使对于大型问题也能实现快速推断。对于(\mathbf{z})具有多项分布的情况,我们通过将其应用于具有精确似然函数的特定模型来验证近似。我们针对几个感兴趣的模型实现了该方法,并通过实证和与其他估计方法比较来评估其性能。即使当(\mathbf{y})主要由小计数主导时,鞍点方法也始终能给出快速准确的推断。