Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546.
Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546;
Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):9956-9961. doi: 10.1073/pnas.1715593115. Epub 2018 Sep 17.
Quantifying the dependence between two random variables is a fundamental issue in data analysis, and thus many measures have been proposed. Recent studies have focused on the renowned mutual information (MI) [Reshef DN, et al. (2011) 334:1518-1524]. However, "Unfortunately, reliably estimating mutual information from finite continuous data remains a significant and unresolved problem" [Kinney JB, Atwal GS (2014) 111:3354-3359]. In this paper, we examine the kernel estimation of MI and show that the bandwidths involved should be equalized. We consider a jackknife version of the kernel estimate with equalized bandwidth and allow the bandwidth to vary over an interval. We estimate the MI by the largest value among these kernel estimates and establish the associated theoretical underpinnings.
量化两个随机变量之间的相关性是数据分析中的一个基本问题,因此已经提出了许多度量方法。最近的研究集中在著名的互信息(MI)上[Reshef DN,等(2011)334:1518-1524]。然而,“不幸的是,从有限的连续数据中可靠地估计互信息仍然是一个重要且未解决的问题”[Kinney JB,Atwal GS(2014)111:3354-3359]。在本文中,我们研究了 MI 的核估计,并表明所涉及的带宽应该是均衡的。我们考虑了核估计的一种等带宽的刀切版本,并允许带宽在一个区间内变化。我们通过这些核估计中的最大值来估计 MI,并建立了相关的理论基础。