Suppr超能文献

分布的约束重加权:一种最优传输方法。

Constrained Reweighting of Distributions: An Optimal Transport Approach.

作者信息

Chakraborty Abhisek, Bhattacharya Anirban, Pati Debdeep

机构信息

Department of Statistics, Texas A&M University, College Station, TX 77843, USA.

出版信息

Entropy (Basel). 2024 Mar 11;26(3):249. doi: 10.3390/e26030249.

Abstract

We commonly encounter the problem of identifying an optimally weight-adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behavior, shapes, number of modes, etc., of the resulting weight-adjusted empirical distribution. In this article, we substantially enhance the flexibility of such a methodology by introducing a nonparametrically imbued distributional constraint on the weights and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight-adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric, while allowing for subtle departures. The proposed scheme for the re-weighting of observations subject to constraints is reminiscent of the empirical likelihood and related ideas, but offers greater flexibility in applications where parametric distribution-guided constraints arise naturally. The versatility of the proposed framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task-namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.

摘要

我们经常遇到这样一个问题

要识别观测数据经验分布的最优加权版本,同时要遵循对权重的预定义约束。此类约束通常表现为对所得加权经验分布的矩、尾部行为、形状、众数数量等的限制。在本文中,我们通过引入对权重的非参数注入分布约束,并开发一个利用最大熵原理和最优传输工具的通用框架,极大地提高了这种方法的灵活性。关键思想是确保观测数据的最大熵加权经验分布在最优传输度量方面接近预先指定的概率分布,同时允许有细微偏差。所提出的受约束观测值重新加权方案让人联想到经验似然及相关思想,但在参数分布引导的约束自然出现的应用中提供了更大的灵活性。在三个不同的应用场景中展示了所提出框架的通用性,在这些场景中,有必要对数据进行重新加权,以满足统计任务核心优化问题的附带约束,即投资组合分配、复杂调查的半参数推断以及确保机器学习算法中的算法公平性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ce4/10969211/f9c709cdb47d/entropy-26-00249-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验