Suppr超能文献

覆盖调整熵估计

Coverage-adjusted entropy estimation.

作者信息

Vu Vincent Q, Yu Bin, Kass Robert E

机构信息

Department of Statistics, University of California, Berkeley, CA 94720-3860, USA.

出版信息

Stat Med. 2007 Sep 20;26(21):4039-60. doi: 10.1002/sim.2942.

Abstract

Data on 'neural coding' have frequently been analyzed using information-theoretic measures. These formulations involve the fundamental and generally difficult statistical problem of estimating entropy. We review briefly several methods that have been advanced to estimate entropy and highlight a method, the coverage-adjusted entropy estimator (CAE), due to Chao and Shen that appeared recently in the environmental statistics literature. This method begins with the elementary Horvitz-Thompson estimator, developed for sampling from a finite population, and adjusts for the potential new species that have not yet been observed in the sample-these become the new patterns or 'words' in a spike train that have not yet been observed. The adjustment is due to I. J. Good, and is called the Good-Turing coverage estimate. We provide a new empirical regularization derivation of the coverage-adjusted probability estimator, which shrinks the maximum likelihood estimate. We prove that the CAE is consistent and first-order optimal, with rate O(P)(1/log n), in the class of distributions with finite entropy variance and that, within the class of distributions with finite qth moment of the log-likelihood, the Good-Turing coverage estimate and the total probability of unobserved words converge at rate O(P)(1/(log n)(q)). We then provide a simulation study of the estimator with standard distributions and examples from neuronal data, where observations are dependent. The results show that, with a minor modification, the CAE performs much better than the MLE and is better than the best upper bound estimator, due to Paninski, when the number of possible words m is unknown or infinite.

摘要

关于“神经编码”的数据经常使用信息论方法进行分析。这些公式涉及估计熵这一基本且通常困难的统计问题。我们简要回顾几种已提出的估计熵的方法,并重点介绍一种由Chao和Shen提出的覆盖调整熵估计器(CAE),该方法最近出现在环境统计文献中。此方法始于为从有限总体中抽样而开发的基本霍维茨 - 汤普森估计器,并针对样本中尚未观察到的潜在新物种进行调整——这些新物种成为尚未观察到的尖峰序列中的新模式或“词”。这种调整源自I. J. Good,被称为古德 - 图灵覆盖估计。我们提供了覆盖调整概率估计器的一种新的经验正则化推导,它会收缩最大似然估计。我们证明,在具有有限熵方差的分布类中,CAE是一致的且一阶最优,收敛速度为O(P)(1/log n),并且在对数似然具有有限q阶矩的分布类中,古德 - 图灵覆盖估计和未观察到的词的总概率以O(P)(1/(log n)(q))的速度收敛。然后,我们对该估计器进行了标准分布的模拟研究,并给出了神经元数据的示例,其中观测值是相关的。结果表明,经过轻微修改后,当可能的词的数量m未知或无穷时,CAE的性能比最大似然估计要好得多,并且优于Paninski提出的最佳上界估计器。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验