Suppr超能文献

基于极大极小链接的带原型的层次聚类

Hierarchical Clustering With Prototypes via Minimax Linkage.

作者信息

Bien Jacob, Tibshirani Robert

机构信息

Department of Statistics, Stanford University, Stanford, CA 94305.

Department of Health Research and Policy and Department of Statistics, Stanford University, Stanford, CA 94305.

出版信息

J Am Stat Assoc. 2011;106(495):1075-1084. doi: 10.1198/jasa.2011.tm10183.

Abstract

Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage-that is, on how one measures the distance between clusters. In this article we investigate , a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage's strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.

摘要

凝聚层次聚类是一类用于理解数据集结构的常用方法。聚类的性质取决于链接方式的选择,也就是说,取决于如何度量簇之间的距离。在本文中,我们研究了一种最近才引入但研究较少的链接方式。极小极大链接的独特之处在于,它自然地将从原始数据集中选择的一个原型与树状图的每个内部节点相关联。这些原型可用于极大地增强层次聚类的可解释性。此外,我们证明极小极大链接具有许多理想的理论性质;例如,极小极大链接树状图不会出现反转(与质心链接不同),并且对数据集的某些扰动具有鲁棒性。我们提供了一种高效的实现方式,并通过对百科全书中的单词进行研究以及对人脸图像数据集的分析,展示了极小极大链接作为一种数据分析和可视化工具的优势。

相似文献

1
7
Optimal variable clustering for high-dimensional matrix valued data.高维矩阵值数据的最优变量聚类
Inf inference. 2025 Mar 12;14(1):iaaf001. doi: 10.1093/imaiai/iaaf001. eCollection 2025 Mar.
8
MCLEAN: Multilevel Clustering Exploration As Network.MCLEAN:作为网络的多层次聚类探索
PeerJ Comput Sci. 2018 Jan 29;4:e145. doi: 10.7717/peerj-cs.145. eCollection 2018.

引用本文的文献

10
Robust Analysis of Phylogenetic Tree Space.系统发育树空间的稳健分析。
Syst Biol. 2022 Aug 10;71(5):1255-1270. doi: 10.1093/sysbio/syab100.

本文引用的文献

1
A framework for feature selection in clustering.一种用于聚类中特征选择的框架。
J Am Stat Assoc. 2010 Jun 1;105(490):713-726. doi: 10.1198/jasa.2010.tm09415.
2
Hausdorff clustering.豪斯多夫聚类
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Oct;78(4 Pt 2):046112. doi: 10.1103/PhysRevE.78.046112. Epub 2008 Oct 28.
3
Hybrid hierarchical clustering with applications to microarray data.适用于微阵列数据的混合层次聚类
Biostatistics. 2006 Apr;7(2):286-301. doi: 10.1093/biostatistics/kxj007. Epub 2005 Nov 21.
7
Cluster analysis and display of genome-wide expression patterns.全基因组表达模式的聚类分析与展示
Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验