Zoltowski David M, Pillow Jonathan W
Princeton Neuroscience Institute, Princeton University; Princeton, NJ 08544.
Princeton Neuroscience Institute & Psychology, Princeton University; Princeton, NJ 08544.
Adv Neural Inf Process Syst. 2018 Dec;31:3517-3527.
Recent advances in recording technologies have allowed neuroscientists to record simultaneous spiking activity from hundreds to thousands of neurons in multiple brain regions. Such large-scale recordings pose a major challenge to existing statistical methods for neural data analysis. Here we develop highly scalable approximate inference methods for Poisson generalized linear models (GLMs) that require only a single pass over the data. Our approach relies on a recently proposed method for obtaining approximate sufficient statistics for GLMs using polynomial approximations [7], which we adapt to the Poisson GLM setting. We focus on inference using quadratic approximations to nonlinear terms in the Poisson GLM log-likelihood with Gaussian priors, for which we derive closed-form solutions to the approximate maximum likelihood and MAP estimates, posterior distribution, and marginal likelihood. We introduce an adaptive procedure to select the polynomial approximation interval and show that the resulting method allows for efficient and accurate inference and regularization of high-dimensional parameters. We use the quadratic estimator to fit a fully-coupled Poisson GLM to spike train data recorded from 831 neurons across five regions of the mouse brain for a duration of 41 minutes, binned at 1 ms resolution. Across all neurons, this model is fit to over 2 billion spike count bins and identifies fine-timescale statistical dependencies between neurons within and across cortical and subcortical areas.
记录技术的最新进展使神经科学家能够同时记录多个脑区中数百到数千个神经元的尖峰活动。如此大规模的记录对现有的神经数据分析统计方法构成了重大挑战。在此,我们针对泊松广义线性模型(GLMs)开发了高度可扩展的近似推断方法,该方法只需要对数据进行一次遍历。我们的方法依赖于最近提出的一种使用多项式近似为GLMs获得近似充分统计量的方法[7],我们将其应用于泊松GLM设置。我们专注于使用高斯先验对泊松GLM对数似然中的非线性项进行二次近似的推断,为此我们推导出了近似最大似然和最大后验估计、后验分布以及边际似然的闭式解。我们引入了一种自适应程序来选择多项式近似区间,并表明由此产生的方法允许对高维参数进行高效且准确的推断和正则化。我们使用二次估计器将一个完全耦合的泊松GLM拟合到从小鼠大脑五个区域的831个神经元记录的尖峰序列数据上,持续时间为41分钟,以1毫秒分辨率进行分箱。在所有神经元中,该模型拟合了超过20亿个尖峰计数箱,并识别出皮质和皮质下区域内以及跨区域神经元之间的精细时间尺度统计依赖性。