Bien Jacob, Tibshirani Robert
Department of Statistics, Stanford University, Stanford, CA 94305.
Department of Health Research and Policy and Department of Statistics, Stanford University, Stanford, CA 94305.
J Am Stat Assoc. 2011;106(495):1075-1084. doi: 10.1198/jasa.2011.tm10183.
Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage-that is, on how one measures the distance between clusters. In this article we investigate , a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage's strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.
凝聚层次聚类是一类用于理解数据集结构的常用方法。聚类的性质取决于链接方式的选择,也就是说,取决于如何度量簇之间的距离。在本文中,我们研究了一种最近才引入但研究较少的链接方式。极小极大链接的独特之处在于,它自然地将从原始数据集中选择的一个原型与树状图的每个内部节点相关联。这些原型可用于极大地增强层次聚类的可解释性。此外,我们证明极小极大链接具有许多理想的理论性质;例如,极小极大链接树状图不会出现反转(与质心链接不同),并且对数据集的某些扰动具有鲁棒性。我们提供了一种高效的实现方式,并通过对百科全书中的单词进行研究以及对人脸图像数据集的分析,展示了极小极大链接作为一种数据分析和可视化工具的优势。