• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于估计概率质量函数的最大似然集。

Maximum likelihood set for estimating a probability mass function.

作者信息

Jedynak Bruno M, Khudanpur Sanjeev

机构信息

Département de Mathématiques, Université des Sciences et Technologies de Lille, France.

出版信息

Neural Comput. 2005 Jul;17(7):1508-30. doi: 10.1162/0899766053723078.

DOI:10.1162/0899766053723078
PMID:15901406
Abstract

We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts--the number of times each value appears in the sample--and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamond-shaped subset of the probability simplex [0,1]k bounded by at most k x (k-1) hyper-planes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the well-known Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using Kullback-Leibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf's law to encode prior knowledge and show that this method permits obtaining state-of-the-art results while being conceptually simpler than most competing methods.

摘要

我们提出了一种从小样本估计离散有限随机变量概率质量函数(pmf)的新方法。我们关注观察到的计数——即每个值在样本中出现的次数——并将最大似然集(MLS)定义为这样一组pmf:对于相同样本量,这些pmf在观察到的计数上分配的质量比在任何其他可能的计数集上分配的质量更多。在本文中,我们详细刻画了MLS。我们表明,MLS是概率单纯形[0,1]k的一个菱形子集,由至多k×(k - 1)个超平面界定,其中k是随机变量可能值的数量。MLS始终包含经验分布,以及基于狄利克雷先验的一族贝叶斯估计量,特别是著名的拉普拉斯估计量。我们建议从MLS中选择最接近编码先验知识的固定pmf的pmf。当使用库尔贝克 - 莱布勒距离进行此选择时,优化问题包括在由线性不等式定义的域上找到凸函数的最小值,对此有可用的标准数值程序。我们将此估计应用于使用齐普夫定律编码先验知识的语言建模,并表明该方法在概念上比大多数竞争方法更简单的同时,能够获得最新的结果。

相似文献

1
Maximum likelihood set for estimating a probability mass function.用于估计概率质量函数的最大似然集。
Neural Comput. 2005 Jul;17(7):1508-30. doi: 10.1162/0899766053723078.
2
Asymptotic optimality of likelihood-based cross-validation.基于似然的交叉验证的渐近最优性。
Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.
3
Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter.使用泊松-伽马模型对机动车碰撞事故进行建模:研究低样本均值和小样本量对固定离散参数估计的影响。
Accid Anal Prev. 2006 Jul;38(4):751-66. doi: 10.1016/j.aap.2006.02.001. Epub 2006 Mar 20.
4
Estimating population size when duplicates are present.
Stat Med. 1996 Aug 15;15(15):1635-46. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1635::AID-SIM337>3.0.CO;2-T.
5
Application of the split-gradient method to 3D image deconvolution in fluorescence microscopy.分裂梯度法在荧光显微镜三维图像去卷积中的应用。
J Microsc. 2009 Apr;234(1):47-61. doi: 10.1111/j.1365-2818.2009.03150.x.
6
Comparison of two platelet count estimation methodologies for peripheral blood smears.外周血涂片两种血小板计数估计方法的比较。
Clin Lab Sci. 2007 Summer;20(3):154-60.
7
Simultaneous beam geometry and intensity map optimization in intensity-modulated radiation therapy.调强放射治疗中射束几何形状与强度图的同步优化
Int J Radiat Oncol Biol Phys. 2006 Jan 1;64(1):301-20. doi: 10.1016/j.ijrobp.2005.08.023. Epub 2005 Nov 14.
8
Modeling disease incidence data with spatial and spatio temporal dirichlet process mixtures.使用空间和时空狄利克雷过程混合模型对疾病发病率数据进行建模。
Biom J. 2008 Feb;50(1):29-42. doi: 10.1002/bimj.200610375.
9
A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities.一种从样本估计离散分布以确保非零概率的最大熵方法。
Entropy (Basel). 2018 Aug 13;20(8):601. doi: 10.3390/e20080601.
10
Coverage-adjusted entropy estimation.覆盖调整熵估计
Stat Med. 2007 Sep 20;26(21):4039-60. doi: 10.1002/sim.2942.

引用本文的文献

1
Rectified Gaussian Scale Mixtures and the Sparse Non-Negative Least Squares Problem.修正高斯尺度混合与稀疏非负最小二乘问题
IEEE Trans Signal Process. 2018 Jun 15;66(12):3124-3139. doi: 10.1109/tsp.2018.2824286. Epub 2018 Apr 6.
2
Where have all the interactions gone? Estimating the coverage of two-hybrid protein interaction maps.所有的相互作用都去哪儿了?估算双杂交蛋白质相互作用图谱的覆盖率。
PLoS Comput Biol. 2007 Nov;3(11):e214. doi: 10.1371/journal.pcbi.0030214. Epub 2007 Sep 21.