Suppr超能文献

使用加权间隙统计量确定聚类的数量。

Determining the number of clusters using the weighted gap statistic.

作者信息

Yan Mingjin, Ye Keying

机构信息

Medtronic Sofamor Danek, 1800 Pyramid Place, Memphis, Tennessee 38132, USA.

出版信息

Biometrics. 2007 Dec;63(4):1031-7. doi: 10.1111/j.1541-0420.2007.00784.x. Epub 2007 Apr 9.

Abstract

Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.

摘要

估计数据集中的聚类数量是聚类分析中的关键步骤。在本文中,受间隙法(Tibshirani、Walther和Hastie,2001年,《皇家统计学会学报B》63卷,411 - 423页)的启发,我们提出了加权间隙法和差分加权(DD加权)间隙法,用于使用加权簇内误差和来估计数据中的聚类数量:这是一种衡量簇内同质性的指标。此外,我们提出了一种“多层”聚类方法,该方法被证明比原始间隙法更准确,特别是在检测数据的嵌套聚类结构方面。这些方法适用于输入数据包含连续测量值的情况,并且可以与任何聚类方法一起使用。我们对这些提出的方法以及原始间隙法进行了模拟研究和实际数据调查与比较。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验