• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

适用于野火科学的大规模非光滑最大熵模型的高效一阶算法

Efficient First-Order Algorithms for Large-Scale, Non-Smooth Maximum Entropy Models with Application to Wildfire Science.

作者信息

Provencher Langlois Gabriel, Buch Jatan, Darbon Jérôme

机构信息

Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.

Department of Earth and Environmental Engineering, Columbia University, New York, NY 10027, USA.

出版信息

Entropy (Basel). 2024 Aug 15;26(8):691. doi: 10.3390/e26080691.

DOI:10.3390/e26080691
PMID:39202161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353449/
Abstract

Maximum entropy (MaxEnt) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, MaxEnt models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for MaxEnt models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical MaxEnt models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth MaxEnt models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth MaxEnt models efficiently. For MaxEnt models with discrete probability distribution of elements built from samples, each containing features, the stepsize parameter estimation and iterations in our algorithms scale on the order of O(mn) operations and can be trivially parallelized. Moreover, the strong ℓ1 convexity of the Kullback-Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the art by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.

摘要

最大熵(MaxEnt)模型是一类统计模型,它使用最大熵原理从数据中估计概率分布。由于现代数据集的规模,MaxEnt模型需要高效的优化算法才能在大数据应用中实现良好的扩展。然而,用于MaxEnt模型的现有算法最初并非为处理大数据集而设计;这些算法要么依赖于可能产生不可靠数值结果的技术手段,扩展性能不佳,要么需要许多实际MaxEnt模型所缺乏的平滑性假设。在本文中,我们提出了新颖的优化算法,克服了用于训练大规模、非平滑MaxEnt模型的现有算法的缺点。我们提出的一阶算法利用库尔贝克-莱布勒散度来高效地训练大规模和非平滑MaxEnt模型。对于由样本构建的具有离散概率分布元素的MaxEnt模型,每个样本包含特征,我们算法中的步长参数估计和迭代的运算规模为O(mn)量级,并且可以很容易地并行化。此外,库尔贝克-莱布勒散度的强ℓ1凸性允许使用更大的步长参数,从而加快了我们算法的收敛速度。为了说明我们新颖算法的效率,我们考虑在美国西部MTBS跨部门野火数据集中,根据生态特征估计火灾发生概率的问题。我们的数值结果表明,我们的算法比现有技术性能优一个数量级,并且产生的结果与野火发生的物理模型以及先前对野火驱动因素的统计分析结果一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/f9405609fb07/entropy-26-00691-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/dccf630c4913/entropy-26-00691-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/e10d5a38c8ce/entropy-26-00691-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/a1f1292f0551/entropy-26-00691-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/f9405609fb07/entropy-26-00691-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/dccf630c4913/entropy-26-00691-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/e10d5a38c8ce/entropy-26-00691-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/a1f1292f0551/entropy-26-00691-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e50e/11353449/f9405609fb07/entropy-26-00691-g004.jpg

相似文献

1
Efficient First-Order Algorithms for Large-Scale, Non-Smooth Maximum Entropy Models with Application to Wildfire Science.适用于野火科学的大规模非光滑最大熵模型的高效一阶算法
Entropy (Basel). 2024 Aug 15;26(8):691. doi: 10.3390/e26080691.
2
Multitemporal Modelling of Socio-Economic Wildfire Drivers in Central Spain between the 1980s and the 2000s: Comparing Generalized Linear Models to Machine Learning Algorithms.20世纪80年代至21世纪初西班牙中部社会经济野火驱动因素的多时态建模:广义线性模型与机器学习算法的比较
PLoS One. 2016 Aug 24;11(8):e0161344. doi: 10.1371/journal.pone.0161344. eCollection 2016.
3
Numerical Algorithms for Estimating Probability Density Function Based on the Maximum Entropy Principle and Fup Basis Functions.基于最大熵原理和Fup基函数估计概率密度函数的数值算法
Entropy (Basel). 2021 Nov 23;23(12):1559. doi: 10.3390/e23121559.
4
The estimation of distributions and the minimum relative entropy principle.分布估计与最小相对熵原理。
Evol Comput. 2005 Spring;13(1):1-27. doi: 10.1162/1063656053583469.
5
Information estimators for weighted observations.加权观测的信息估计量。
Neural Netw. 2013 Oct;46:260-75. doi: 10.1016/j.neunet.2013.06.005. Epub 2013 Jun 24.
6
Computation of Kullback-Leibler Divergence in Bayesian Networks.贝叶斯网络中库尔贝克-莱布勒散度的计算。
Entropy (Basel). 2021 Aug 28;23(9):1122. doi: 10.3390/e23091122.
7
Some Order Preserving Inequalities for Cross Entropy and Kullback-Leibler Divergence.关于交叉熵和库尔贝克-莱布勒散度的一些保序不等式。
Entropy (Basel). 2018 Dec 12;20(12):959. doi: 10.3390/e20120959.
8
Maximum Entropy Approach in Dynamic Contrast-Enhanced Magnetic Resonance Imaging.动态对比增强磁共振成像中的最大熵方法
Methods Inf Med. 2017;56(6):461-468. doi: 10.3414/ME17-01-0027. Epub 2018 Feb 10.
9
Entropy production and Kullback-Leibler divergence between stationary trajectories of discrete systems.离散系统平稳轨迹之间的熵产生与库尔贝克-莱布勒散度
Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Mar;85(3 Pt 1):031129. doi: 10.1103/PhysRevE.85.031129. Epub 2012 Mar 21.
10
A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering.一种基于新 DNA 序列信息熵的 Kullback-Leibler 基因聚类算法。
J Appl Genet. 2020 May;61(2):231-238. doi: 10.1007/s13353-020-00543-x. Epub 2020 Jan 24.

本文引用的文献

1
Elastic Net Regularization Paths for All Generalized Linear Models.所有广义线性模型的弹性网络正则化路径
J Stat Softw. 2023;106. doi: 10.18637/jss.v106.i01. Epub 2023 Mar 23.
2
Rapid Growth of Large Forest Fires Drives the Exponential Response of Annual Forest-Fire Area to Aridity in the Western United States.大型森林火灾的快速增长推动了美国西部年度森林火灾面积对干旱的指数响应。
Geophys Res Lett. 2022 Mar 16;49(5):e2021GL097131. doi: 10.1029/2021GL097131. Epub 2022 Mar 8.
3
Automatic variable selection in ecological niche modeling: A case study using Cassin's Sparrow (Peucaea cassinii).
生态位建模中的自动变量选择:以卡西氏雀鹀(Peucaea cassinii)为例的研究
PLoS One. 2022 Jan 21;17(1):e0257502. doi: 10.1371/journal.pone.0257502. eCollection 2022.
4
Toward a Monte Carlo approach to selecting climate variables in MaxEnt.迈向 MaxEnt 中选择气候变量的蒙特卡罗方法。
PLoS One. 2021 Mar 3;16(3):e0237208. doi: 10.1371/journal.pone.0237208. eCollection 2021.
5
Harmonized global maps of above and belowground biomass carbon density in the year 2010.2010 年地上和地下生物量碳密度的全球协调图。
Sci Data. 2020 Apr 6;7(1):112. doi: 10.1038/s41597-020-0444-4.
6
How much does climate change threaten European forest tree species distributions?气候变化对欧洲森林树种分布的威胁有多大?
Glob Chang Biol. 2018 Mar;24(3):1150-1163. doi: 10.1111/gcb.13925. Epub 2017 Oct 30.
7
Maximum entropy models as a tool for building precise neural controls.最大熵模型作为构建精确神经控制的工具。
Curr Opin Neurobiol. 2017 Oct;46:120-126. doi: 10.1016/j.conb.2017.08.001. Epub 2017 Sep 3.
8
Finite-Sample Equivalence in Statistical Models for Presence-Only Data.仅存在数据统计模型中的有限样本等价性。
Ann Appl Stat. 2013 Dec 1;7(4):1917-1939. doi: 10.1214/13-AOAS667.
9
Stimulus-dependent maximum entropy models of neural population codes.基于刺激的神经群体编码最大熵模型。
PLoS Comput Biol. 2013;9(3):e1002922. doi: 10.1371/journal.pcbi.1002922. Epub 2013 Mar 14.
10
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.