Suppr超能文献

利用有限样本估计两个离散非对称变量之间的互信息

Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples.

作者信息

Hernández Damián G, Samengo Inés

机构信息

Department of Medical Physics, Centro Atómico Bariloche and Instituto Balseiro, 8400 San Carlos de Bariloche, Argentina.

出版信息

Entropy (Basel). 2019 Jun 25;21(6):623. doi: 10.3390/e21060623.

Abstract

Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables-the one with minimal entropy-is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.

摘要

确定两个变量之间非线性统计依赖关系的强度在许多研究领域都是至关重要的问题。用于量化此类关系的既定度量是互信息。然而,从有限样本中估计互信息是一项具有挑战性的任务。由于互信息是两个熵的差值,现有的熵的贝叶斯估计器可用于估计信息。然而,在严重欠采样的情况下,该过程仍然存在偏差。在此,我们提出一种替代估计器,它适用于两个变量之一(熵最小的那个变量)的边际分布被充分采样的情况。另一个变量以及联合分布和条件分布可能严重欠采样。我们得到了一个一致估计器,它具有非常低的偏差,即使在采样数据中巧合很少时也优于先前的方法。与其他贝叶斯估计器一样,我们的提议关注两个变量之间相互作用的强度,但不试图对它们相关的具体方式进行建模。我们方法的一个独特性质是,决定互信息量的主要数据统计量是高熵变量出现巧合的那些状态下低熵变量条件分布的不均匀性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ece/7515115/95a795d815c5/entropy-21-00623-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验