• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种从样本估计离散分布以确保非零概率的最大熵方法。

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities.

作者信息

Darscheid Paul, Guthke Anneli, Ehret Uwe

机构信息

Institute of Water Resources and River Basin Management, Karlsruhe Institute of Technology-KIT, 76131 Karlsruhe, Germany.

Institute for Modelling Hydraulic and Environmental Systems (IWS), University of Stuttgart, 70569 Stuttgart, Germany.

出版信息

Entropy (Basel). 2018 Aug 13;20(8):601. doi: 10.3390/e20080601.

DOI:10.3390/e20080601
PMID:33265690
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7513126/
Abstract

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback-Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper-Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback-Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple "add one counter", and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.

摘要

当根据数据集的样本构建离散(分箱)分布时,存在一些应用场景,需要确保样本分布的所有箱都具有非零概率。例如,如果样本分布是预测模型的一部分,我们需要对整个值域返回响应;或者如果我们使用Kullback-Leibler散度来衡量样本分布与变量原始分布的(不)一致性,在上述情况下,这种不一致性是无穷大的,会带来不便。存在几种基于样本的分布估计器可确保箱概率非零,例如给样本直方图的每个零概率箱添加一个计数器,给样本概率密度函数添加一个小概率,采用核密度平滑等平滑方法,或基于狄利克雷分布和多项分布的贝叶斯方法。在此,我们提出并测试一种基于克洛普 - 皮尔逊方法的途径,该方法利用二项分布。基于样本分布,计算箱占用概率的置信区间。每个置信区间的均值是真实箱占用概率的严格正估计器,并且随着样本量的增加而收敛。对于小样本,它收敛于均匀分布,即该方法有效地应用了最大熵方法。我们将这种非零方法和四种基于样本的替代分布估计器应用于一系列典型分布(均匀分布、狄拉克分布、正态分布、多峰分布和不规则分布),并使用Kullback-Leibler散度来衡量效果。虽然每种方法的性能强烈依赖于所应用的分布类型,但总体而言,特别是对于小样本量,非零方法、简单的“添加一个计数器”方法以及贝叶斯狄利克雷 - 多项模型表现出非常相似的行为且性能最佳。我们得出结论,在没有关于分布形状的先验概念的情况下估计分布时,应用这些方法之一是有利的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/513a80b92f46/entropy-20-00601-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/1c7a08cc5591/entropy-20-00601-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/a38890dfd337/entropy-20-00601-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/513a80b92f46/entropy-20-00601-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/1c7a08cc5591/entropy-20-00601-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/a38890dfd337/entropy-20-00601-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c81f/7513126/513a80b92f46/entropy-20-00601-g003.jpg

相似文献

1
A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities.一种从样本估计离散分布以确保非零概率的最大熵方法。
Entropy (Basel). 2018 Aug 13;20(8):601. doi: 10.3390/e20080601.
2
Bayesian estimation of the Kullback-Leibler divergence for categorical systems using mixtures of Dirichlet priors.使用狄利克雷先验混合对分类系统的库尔贝克-莱布勒散度进行贝叶斯估计。
Phys Rev E. 2024 Feb;109(2-1):024305. doi: 10.1103/PhysRevE.109.024305.
3
Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples.从等概率随机样本计算一维熵的精确概率估计值。
Entropy (Basel). 2021 Jun 11;23(6):740. doi: 10.3390/e23060740.
4
Low-probability states, data statistics, and entropy estimation.低概率状态、数据统计与熵估计。
Phys Rev E. 2023 Jul;108(1-1):014101. doi: 10.1103/PhysRevE.108.014101.
5
Information estimators for weighted observations.加权观测的信息估计量。
Neural Netw. 2013 Oct;46:260-75. doi: 10.1016/j.neunet.2013.06.005. Epub 2013 Jun 24.
6
Maximum likelihood set for estimating a probability mass function.用于估计概率质量函数的最大似然集。
Neural Comput. 2005 Jul;17(7):1508-30. doi: 10.1162/0899766053723078.
7
Computation of Kullback-Leibler Divergence in Bayesian Networks.贝叶斯网络中库尔贝克-莱布勒散度的计算。
Entropy (Basel). 2021 Aug 28;23(9):1122. doi: 10.3390/e23091122.
8
Estimating probabilities from experimental frequencies.根据实验频率估计概率。
Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Apr;65(4 Pt 2A):046124. doi: 10.1103/PhysRevE.65.046124. Epub 2002 Apr 4.
9
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
10
Efficient First-Order Algorithms for Large-Scale, Non-Smooth Maximum Entropy Models with Application to Wildfire Science.适用于野火科学的大规模非光滑最大熵模型的高效一阶算法
Entropy (Basel). 2024 Aug 15;26(8):691. doi: 10.3390/e26080691.

引用本文的文献

1
Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples.从等概率随机样本计算一维熵的精确概率估计值。
Entropy (Basel). 2021 Jun 11;23(6):740. doi: 10.3390/e23060740.

本文引用的文献

1
Estimating mutual information.估计互信息。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066138. doi: 10.1103/PhysRevE.69.066138. Epub 2004 Jun 23.
2
On small-sample confidence intervals for parameters in discrete distributions.关于离散分布中参数的小样本置信区间。
Biometrics. 2001 Sep;57(3):963-71. doi: 10.1111/j.0006-341x.2001.00963.x.
3
Independent coordinates for strange attractors from mutual information.基于互信息的奇异吸引子的独立坐标
Phys Rev A Gen Phys. 1986 Feb;33(2):1134-1140. doi: 10.1103/physreva.33.1134.
4
Confidence intervals for a binomial proportion.二项比例的置信区间。
Stat Med. 1993 May 15;12(9):809-24. doi: 10.1002/sim.4780120902.