Li Xinyang, Peng Xiaoling
Faculty of Science and Technology, BNU-HKBU United International College, Zhuhai 519087, China.
Guangdong Provincial/Zhuhai Key Laboratory of IRADS, BNU-HKBU United International College, Zhuhai 519087, China.
Entropy (Basel). 2025 Feb 27;27(3):249. doi: 10.3390/e27030249.
Pareto distributions are widely applied in various fields, such as economics, finance, and environmental studies. The modeling of real-world data has created a demand for the discretization of Pareto distributions. In this paper, we propose using mean squared error representative points (MSE-RPs) as the discrete representation of Pareto distributions. We demonstrate the uniqueness and existence of these representative points under certain parameter settings and provide a theoretical k-means algorithm for the computation of MSE-RPs for Pareto I and Pareto II distributions. Furthermore, to enhance the applicability of MSE-RPs, we employ three methodological approaches to estimate the MSE-RPs of Pareto distributions. By analyzing the estimation bias under different parameters and methods, we recommend estimating the distribution parameters first before estimating the MSE-RPS for Pareto I and Pareto II distributions. For Pareto III and Pareto IV distributions, we suggest using the Bq quantiles for MSE-RP estimation. Building on this, we analyze the sources of estimation bias and propose an effective method for determining the number of MSE-RPs based on information gain truncation. Through simulations and real data studies, we demonstrate that the proposed methods for MSE-RP estimation are effective and can be used to fit the empirical distribution function of data accurately.
帕累托分布在经济学、金融和环境研究等各个领域都有广泛应用。对现实世界数据的建模引发了对帕累托分布离散化的需求。在本文中,我们提议使用均方误差代表点(MSE-RPs)作为帕累托分布的离散表示。我们证明了在某些参数设置下这些代表点的唯一性和存在性,并为帕累托I型和帕累托II型分布的MSE-RPs计算提供了一种理论k均值算法。此外,为了提高MSE-RPs的适用性,我们采用三种方法来估计帕累托分布的MSE-RPs。通过分析不同参数和方法下的估计偏差,我们建议在估计帕累托I型和帕累托II型分布的MSE-RPS之前先估计分布参数。对于帕累托III型和帕累托IV型分布,我们建议使用Bq分位数进行MSE-RP估计。在此基础上,我们分析了估计偏差的来源,并提出了一种基于信息增益截断确定MSE-RPs数量的有效方法。通过模拟和实际数据研究,我们证明了所提出的MSE-RP估计方法是有效的,可用于准确拟合数据的经验分布函数。