Center for Neural Science, New York University, New York, New York, United States of America.
Department of Psychology, Princeton University, Princeton, New Jersey, United States of America.
PLoS Comput Biol. 2020 Dec 23;16(12):e1008483. doi: 10.1371/journal.pcbi.1008483. eCollection 2020 Dec.
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.
科学假设的命运往往取决于计算模型解释数据的能力,在现代统计方法中,这种能力用似然函数来量化。对数似然是参数估计和模型评估的关键要素。然而,在计算生物学和神经科学等领域,复杂模型的对数似然通常难以进行解析或数值计算。在这些情况下,研究人员通常只能通过将观察数据与模型模拟生成的合成观测数据进行比较来估计对数似然。通过模拟来近似似然的标准技术要么使用数据的摘要统计信息,要么有产生估计值实质性偏差的风险。在这里,我们探索了另一种方法,即逆二项式抽样(IBS),它可以有效地、无偏地估计整个数据集的对数似然。对于每个观测值,IBS 从模拟器模型中抽取样本,直到一个样本与观测值匹配。然后,对数似然估计值是抽取样本数量的函数。该估计器的方差是均匀有界的,对于无偏估计器达到最小方差,并且我们可以计算方差的校准估计值。我们提供了支持 IBS 的理论依据,并对基于模拟的模型的最大似然估计的方法进行了实证评估。作为案例研究,我们从计算和认知神经科学中选择了三个越来越复杂的模型拟合问题。在所有问题中,IBS 通常比具有相同平均样本数量的其他抽样方法产生更低的参数估计误差和最大对数似然值。我们的结果表明,当精确技术不可用时,IBS 作为一种实用、稳健且易于实现的对数似然评估方法具有潜力。