Safaai Houman, Onken Arno, Harvey Christopher D, Panzeri Stefano
Department of Neurobiology, Harvard Medical School, Boston, MA.
Istituto Italiano di Tecnologia, Rovereto, Italy.
Phys Rev E. 2018 Nov;98(5). doi: 10.1103/PhysRevE.98.053302. Epub 2018 Nov 5.
Estimation of mutual information between random variables has become crucial in a range of fields, from physics to neuroscience to finance. Estimating information accurately over a wide range of conditions relies on the development of flexible methods to describe statistical dependencies among variables, without imposing potentially invalid assumptions on the data. Such methods are needed in cases that lack prior knowledge of their statistical properties and that have limited sample numbers. Here we propose a powerful and generally applicable information estimator based on non-parametric copulas. This estimator, called the non-parametric copula-based estimator (NPC), is tailored to take into account detailed stochastic relationships in the data independently of the data's marginal distributions. The NPC estimator can be used both for continuous and discrete numerical variables and thus provides a single framework for the mutual information estimation of both continuous and discrete data. By extensive validation on artificial samples drawn from various statistical distributions, we found that the NPC estimator compares well against commonly used alternatives. Unlike methods not based on copulas, it allows an estimation of information that is robust to changes of the details of the marginal distributions. Unlike parametric copula methods, it remains accurate regardless of the precise form of the interactions between the variables. In addition, the NPC estimator had accurate information estimates even at low sample numbers, in comparison to alternative estimators. The NPC estimator therefore provides a good balance between general applicability to arbitrarily shaped statistical dependencies in the data and shows accurate and robust performance when working with small sample sizes. We anticipate that the non-parametric copula information estimator will be a powerful tool in estimating mutual information between a broad range of data.
估计随机变量之间的互信息在从物理学到神经科学再到金融等一系列领域中已变得至关重要。在广泛的条件下准确估计信息依赖于开发灵活的方法来描述变量之间的统计依赖性,而不对数据施加可能无效的假设。在缺乏关于其统计特性的先验知识且样本数量有限的情况下,需要这样的方法。在此,我们提出一种基于非参数copula的强大且普遍适用的信息估计器。这种估计器称为基于非参数copula的估计器(NPC),它经过定制,能够独立于数据的边际分布来考虑数据中详细的随机关系。NPC估计器可用于连续和离散数值变量,从而为连续和离散数据的互信息估计提供了一个单一框架。通过对从各种统计分布中抽取的人工样本进行广泛验证,我们发现NPC估计器与常用的替代方法相比表现良好。与不基于copula的方法不同,它允许估计对边际分布细节变化具有鲁棒性的信息。与参数copula方法不同,无论变量之间相互作用的精确形式如何,它都能保持准确。此外,与替代估计器相比,即使在样本数量较少时,NPC估计器也能给出准确的信息估计。因此,NPC估计器在对数据中任意形状的统计依赖性具有普遍适用性与在处理小样本量时表现出准确且鲁棒的性能之间实现了良好的平衡。我们预计,非参数copula信息估计器将成为估计广泛数据之间互信息的有力工具。