Reba Felix, Saifudin Toha, Hendradi Rimuljo
Doctoral Program of Mathematics and Natural Sciences, Faculty of Sciences and Technology, Universitas Airlangga, Surabaya, Indonesia.
Mathematics Department, Faculty of Sciences and Technology, Universitas Airlangga, Surabaya, Indonesia.
MethodsX. 2025 Aug 27;15:103586. doi: 10.1016/j.mex.2025.103586. eCollection 2025 Dec.
Goodness-of-Fit (GoF) tests are applied to assess the suitability of probability distributions for environmental data. However, classical methods such as Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) often yield inconsistent outcomes in heterogeneous datasets. Previous studies employed clustering or mixture modeling separately, lacking integration with automated estimation and adaptive weighting. This study introduces a unified framework combining GoF evaluation, K-Means++ clustering, and a KS-weighted mixture model to enhance distribution selection. Seventeen univariate probability distributions were tested on chlorophyll concentration data from the Black Sea, with adequacy assessed via KS and AD tests and five information criteria. The framework was implemented via a MATLAB GUI to automate clustering, estimation, model selection, and evaluation steps. Tested across multiple sample sizes and extended to variables, the GUI demonstrated adaptability and robustness. Model performance showed that the KS-weighted mixture model provided stable fits for complex datasets, improving interpretability and reducing reliance on single-distribution assumptions. Integrates GoF testing, clustering, and mixture modeling Implements a reproducible workflow via MATLAB GUI Enhances robustness and positions mixture modeling within environmental data analysis.
拟合优度(GoF)检验用于评估概率分布对环境数据的适用性。然而,诸如柯尔莫哥洛夫-斯米尔诺夫(KS)和安德森- Darling(AD)等经典方法在异质数据集中往往会产生不一致的结果。以往的研究分别采用聚类或混合建模,缺乏与自动估计和自适应加权的整合。本研究引入了一个统一的框架,将GoF评估、K-Means++聚类和KS加权混合模型相结合,以增强分布选择。对来自黑海的叶绿素浓度数据测试了17种单变量概率分布,并通过KS和AD检验以及五个信息准则评估其充分性。该框架通过MATLAB GUI实现,以自动化聚类、估计、模型选择和评估步骤。在多个样本量上进行测试并扩展到变量,该GUI展示了适应性和稳健性。模型性能表明,KS加权混合模型为复杂数据集提供了稳定的拟合,提高了可解释性并减少了对单分布假设的依赖。集成了GoF测试、聚类和混合建模 通过MATLAB GUI实现了可重复的工作流程 增强了稳健性,并将混合建模定位在环境数据分析中。