Suppr超能文献

半参数聚类:参数聚类的一种稳健替代方法。

Semiparametric Clustering: A Robust Alternative to Parametric Clustering.

作者信息

Pan Binbin, Dong Huaiqin, Chen Wen-Sheng, Xu Chen

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2583-2597. doi: 10.1109/TNNLS.2018.2884790. Epub 2018 Dec 28.

Abstract

Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

摘要

聚类旨在根据潜在的数据分布对数据进行自然分组。数据分布通常使用参数模型或非参数模型进行估计,例如高斯混合模型或核密度估计。与非参数模型相比,参数模型在统计上是稳定的,即数据点的微小扰动只会导致估计密度的微小变化。然而,参数模型对异常值高度敏感,因为在存在异常值的情况下,数据分布与参数假设相差甚远。给定一个参数聚类算法,本文展示了如何将该算法转变为一个鲁棒的算法。其思路是将原始的参数密度修改为半参数密度。构成每个聚类核心的高密度数据使用原始参数密度进行建模。低密度数据通常远离聚类核心,并且可能具有任意形状,因此使用非参数密度进行建模。参数聚类算法和非参数聚类算法相结合,用于对建模为半参数密度的数据进行分组。从稳健统计学的角度来看,所提出的方法具有良好的稳健性。我们在几个合成数据集和70个UCI数据集上测试了所提出的算法。结果表明,半参数方法可以显著提高聚类性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验