Santos Jorge M, Marques de Sa Joaquim, Alexandre Luis A
Department of Mathematics, ISEP- Polytechnic, School of Engineering, Porto, Portugal.
IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):62-75. doi: 10.1109/TPAMI.2007.1142.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGClust) that builds layers of subgraphs based on this matrix, and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.
层次聚类是一种逐步聚类方法,通常基于给定数据集中对象或对象集之间的接近度度量。最常见的接近度度量是距离度量。导出的接近度矩阵可用于构建图,这些图为某些聚类方法提供了基本结构。我们在此提出一种基于熵度量的新接近度矩阵,以及一种聚类算法(LEGClust),该算法基于此矩阵构建子图层,并使用它们和层次凝聚聚类技术来形成聚类。我们的方法利用了图结构和层次构造。此外,通过使用熵作为接近度度量,我们无需对聚类形状做任何假设,就能捕捉数据的局部结构,迫使聚类方法反映这种结构。我们在人工和真实数据集上进行了多项实验,这些实验证明了与竞争算法相比,这种新算法具有卓越的性能。