Haslbeck Jonas M B, Wulff Dirk U
Psychological Methods Group, University of Amsterdam, Amsterdam, The Netherlands.
Center for Cognitive and Decision Science, University of Basel, Basel, Switzerland.
Comput Stat. 2020;35(4):1879-1894. doi: 10.1007/s00180-020-00981-5. Epub 2020 May 18.
We improve instability-based methods for the selection of the number of clusters in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible , overcoming limitations of current insability-based methods for large . We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.
我们通过开发一种校正聚类距离来改进聚类分析中基于不稳定性的聚类数量选择方法,该距离可校正聚类大小分布对聚类不稳定性的不良影响。我们表明,在整个可能的序列中,我们校正后的不稳定性度量优于当前基于不稳定性的度量,克服了当前基于不稳定性方法在大数据集时的局限性。我们还首次比较了基于模型和无模型的确定聚类不稳定性的方法,发现它们的性能相当。我们将我们的方法以R包cstab的形式提供。