Ramirez Alexandro D, Paninski Liam
Weill Cornell Medical College, New York, NY, USA,
J Comput Neurosci. 2014 Apr;36(2):215-34. doi: 10.1007/s10827-013-0466-4. Epub 2013 Jul 6.
Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate computation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting "expected log-likelihood" can in many cases be computed significantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation becomes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings compared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly decrease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina.
广义线性模型在各种各样的统计应用中发挥着重要作用。本文讨论了这些模型中似然性的一种近似方法,它可以极大地促进计算。基本思想是用对模型协变量的期望来代替精确对数似然中出现的求和;在许多情况下,由此产生的“期望对数似然”的计算速度比精确对数似然要快得多。在许多神经科学实验中,模型协变量的分布由实验者控制,期望对数似然近似变得特别有用;例如,与精确最大似然估计器相比,基于最大化这个期望对数似然(或其惩罚版本)的估计器通常可以在计算上节省几个数量级。风险分析表明,与标准最大似然估计相比,这些最大期望对数似然估计器在准确性方面通常代价很小(在某些情况下甚至准确性有所提高)。最后,我们发现这些方法可以显著减少用于模型选择的边际似然计算以及用于从后验参数分布中采样的马尔可夫链蒙特卡罗方法的计算时间。我们通过将这些方法应用于通过灵长类动物视网膜的大规模多电极记录获得的具有计算挑战性的神经尖峰序列数据集来说明我们的结果。