IEEE Trans Neural Netw Learn Syst. 2017 Dec;28(12):3007-3017. doi: 10.1109/TNNLS.2016.2608001. Epub 2016 Oct 5.
It is crucial to determine the optimal number of clusters for the clustering quality in cluster analysis. From the standpoint of sample geometry, two concepts, i.e., the sample clustering dispersion degree and the sample clustering synthesis degree, are defined, and a new clustering validity index is designed. Moreover, a method for determining the optimal number of clusters based on an agglomerative hierarchical clustering (AHC) algorithm is proposed. The new index and the method can evaluate the clustering results produced by the AHC and determine the optimal number of clusters for multiple types of datasets, such as linear, manifold, annular, and convex structures. Theoretical research and experimental results indicate the validity and good performance of the proposed index and the method.
在聚类分析中,确定最佳聚类数对于聚类质量至关重要。从样本几何角度出发,定义了两个概念,即样本聚类分散度和样本聚类综合度,并设计了一种新的聚类有效性指标。此外,还提出了一种基于凝聚层次聚类(AHC)算法的确定最佳聚类数的方法。新的指标和方法可以评估 AHC 产生的聚类结果,并确定多种类型数据集(如线性、流形、环形和凸结构)的最佳聚类数。理论研究和实验结果表明了所提出的指标和方法的有效性和良好性能。