• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

半参数聚类:参数聚类的一种稳健替代方法。

Semiparametric Clustering: A Robust Alternative to Parametric Clustering.

作者信息

Pan Binbin, Dong Huaiqin, Chen Wen-Sheng, Xu Chen

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2583-2597. doi: 10.1109/TNNLS.2018.2884790. Epub 2018 Dec 28.

DOI:10.1109/TNNLS.2018.2884790
PMID:30602425
Abstract

Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

摘要

聚类旨在根据潜在的数据分布对数据进行自然分组。数据分布通常使用参数模型或非参数模型进行估计,例如高斯混合模型或核密度估计。与非参数模型相比,参数模型在统计上是稳定的,即数据点的微小扰动只会导致估计密度的微小变化。然而,参数模型对异常值高度敏感,因为在存在异常值的情况下,数据分布与参数假设相差甚远。给定一个参数聚类算法,本文展示了如何将该算法转变为一个鲁棒的算法。其思路是将原始的参数密度修改为半参数密度。构成每个聚类核心的高密度数据使用原始参数密度进行建模。低密度数据通常远离聚类核心,并且可能具有任意形状,因此使用非参数密度进行建模。参数聚类算法和非参数聚类算法相结合,用于对建模为半参数密度的数据进行分组。从稳健统计学的角度来看,所提出的方法具有良好的稳健性。我们在几个合成数据集和70个UCI数据集上测试了所提出的算法。结果表明,半参数方法可以显著提高聚类性能。

相似文献

1
Semiparametric Clustering: A Robust Alternative to Parametric Clustering.半参数聚类:参数聚类的一种稳健替代方法。
IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2583-2597. doi: 10.1109/TNNLS.2018.2884790. Epub 2018 Dec 28.
2
Semiparametric clustering method for microarray data analysis.
J Bioinform Comput Biol. 2008 Apr;6(2):261-82. doi: 10.1142/s021972000800345x.
3
A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.一种用于聚类和跟踪神经峰的非参数贝叶斯方法。
J Neurosci Methods. 2014 Feb 15;223:85-91. doi: 10.1016/j.jneumeth.2013.12.005. Epub 2013 Dec 12.
4
Parametric and nonparametric population methods: their comparative performance in analysing a clinical dataset and two Monte Carlo simulation studies.参数和非参数总体方法:它们在分析临床数据集和两项蒙特卡罗模拟研究中的比较性能。
Clin Pharmacokinet. 2006;45(4):365-83. doi: 10.2165/00003088-200645040-00003.
5
Accelerated failure time modeling via nonparametric mixtures.基于非参数混合模型的加速失效时间建模。
Biometrics. 2023 Mar;79(1):165-177. doi: 10.1111/biom.13556. Epub 2021 Sep 20.
6
Fast clustering using adaptive density peak detection.使用自适应密度峰值检测的快速聚类
Stat Methods Med Res. 2017 Dec;26(6):2800-2811. doi: 10.1177/0962280215609948. Epub 2015 Oct 16.
7
Asymptotic Properties for Methods Combining the Minimum Hellinger Distance Estimate and the Bayesian Nonparametric Density Estimate.结合最小Hellinger距离估计与贝叶斯非参数密度估计方法的渐近性质。
Entropy (Basel). 2018 Dec 11;20(12):955. doi: 10.3390/e20120955.
8
Classification based on hybridization of parametric and nonparametric classifiers.基于参数化和非参数化分类器杂交的分类方法。
IEEE Trans Pattern Anal Mach Intell. 2009 Jul;31(7):1153-64. doi: 10.1109/TPAMI.2008.149.
9
Visual MRI: merging information visualization and non-parametric clustering techniques for MRI dataset analysis.可视化磁共振成像:融合信息可视化与非参数聚类技术用于磁共振成像数据集分析。
Artif Intell Med. 2008 Nov;44(3):183-99. doi: 10.1016/j.artmed.2008.06.006. Epub 2008 Sep 4.
10
Robust Bayesian clustering.稳健贝叶斯聚类
Neural Netw. 2007 Jan;20(1):129-38. doi: 10.1016/j.neunet.2006.06.009. Epub 2006 Sep 29.

引用本文的文献

1
Developing a predictive signature for two trial endpoints using the cross-validated risk scores method.使用交叉验证风险评分方法为两个试验终点开发预测特征。
Biostatistics. 2023 Apr 14;24(2):327-344. doi: 10.1093/biostatistics/kxaa055.