Zhang Jialin, Chen Chen
Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
Stat Appl Genet Mol Biol. 2018 Mar 30;17(2):sagmb-2018-0005. doi: 10.1515/sagmb-2018-0005.
Zhang, Z. and Zheng, L. (2015): "A mutual information estimator with exponentially decaying bias," Stat. Appl. Genet. Mol. Biol., 14, 243-252, proposed a nonparametric estimator of mutual information developed in entropic perspective, and demonstrated that it has much smaller bias than the plugin estimator yet with the same asymptotic normality under certain conditions. However it is incorrectly suggested in their article that the asymptotic normality could be used for testing independence between two random elements on a joint alphabet. When two random elements are independent, the asymptotic distribution of $\sqrt{n}$n-normed estimator degenerates and therefore the claimed normality does not hold. This article complements Zhang and Zheng by establishing a new chi-square test using the same entropic statistics for mutual information being zero. The three examples in Zhang and Zheng are re-worked using the new test. The results turn out to be much more sensible and further illustrate the advantage of the entropic perspective in statistical inference on alphabets. More specifically in Example 2, when a positive mutual information is known to exist, the new test detects it but the log likelihood ratio test fails to do so.
张,Z. 和郑,L.(2015):“一种具有指数衰减偏差的互信息估计器”,《统计应用遗传学与分子生物学》,14,243 - 252,提出了一种从熵的角度开发的互信息非参数估计器,并证明在某些条件下,它的偏差比插件估计器小得多,同时具有相同的渐近正态性。然而,他们的文章中错误地暗示渐近正态性可用于检验联合字母表上两个随机元素之间的独立性。当两个随机元素独立时,$\sqrt{n}$标准化估计器的渐近分布退化,因此所声称的正态性不成立。本文通过使用相同的熵统计量针对互信息为零建立一个新的卡方检验来补充张和郑的研究。使用新检验对张和郑文章中的三个例子重新进行了分析。结果更加合理,并进一步说明了熵视角在字母表统计推断中的优势。更具体地说,在例2中,当已知存在正互信息时,新检验能检测到它,但对数似然比检验却无法做到。