Suppr超能文献

用于环境数据集中概率分布的拟合优度评估和聚类的图形界面。

Graphical interface for goodness-of-fit evaluation and clustering of probability distributions in environmental datasets.

作者信息

Reba Felix, Saifudin Toha, Hendradi Rimuljo

机构信息

Doctoral Program of Mathematics and Natural Sciences, Faculty of Sciences and Technology, Universitas Airlangga, Surabaya, Indonesia.

Mathematics Department, Faculty of Sciences and Technology, Universitas Airlangga, Surabaya, Indonesia.

出版信息

MethodsX. 2025 Aug 27;15:103586. doi: 10.1016/j.mex.2025.103586. eCollection 2025 Dec.

Abstract

Goodness-of-Fit (GoF) tests are applied to assess the suitability of probability distributions for environmental data. However, classical methods such as Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) often yield inconsistent outcomes in heterogeneous datasets. Previous studies employed clustering or mixture modeling separately, lacking integration with automated estimation and adaptive weighting. This study introduces a unified framework combining GoF evaluation, K-Means++ clustering, and a KS-weighted mixture model to enhance distribution selection. Seventeen univariate probability distributions were tested on chlorophyll concentration data from the Black Sea, with adequacy assessed via KS and AD tests and five information criteria. The framework was implemented via a MATLAB GUI to automate clustering, estimation, model selection, and evaluation steps. Tested across multiple sample sizes and extended to variables, the GUI demonstrated adaptability and robustness. Model performance showed that the KS-weighted mixture model provided stable fits for complex datasets, improving interpretability and reducing reliance on single-distribution assumptions. Integrates GoF testing, clustering, and mixture modeling Implements a reproducible workflow via MATLAB GUI Enhances robustness and positions mixture modeling within environmental data analysis.

摘要

拟合优度(GoF)检验用于评估概率分布对环境数据的适用性。然而,诸如柯尔莫哥洛夫-斯米尔诺夫(KS)和安德森- Darling(AD)等经典方法在异质数据集中往往会产生不一致的结果。以往的研究分别采用聚类或混合建模,缺乏与自动估计和自适应加权的整合。本研究引入了一个统一的框架,将GoF评估、K-Means++聚类和KS加权混合模型相结合,以增强分布选择。对来自黑海的叶绿素浓度数据测试了17种单变量概率分布,并通过KS和AD检验以及五个信息准则评估其充分性。该框架通过MATLAB GUI实现,以自动化聚类、估计、模型选择和评估步骤。在多个样本量上进行测试并扩展到变量,该GUI展示了适应性和稳健性。模型性能表明,KS加权混合模型为复杂数据集提供了稳定的拟合,提高了可解释性并减少了对单分布假设的依赖。集成了GoF测试、聚类和混合建模 通过MATLAB GUI实现了可重复的工作流程 增强了稳健性,并将混合建模定位在环境数据分析中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0481/12423415/879679249915/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验