Suppr超能文献

模糊谱聚类的高效不确定性最小化

Efficient uncertainty minimization for fuzzy spectral clustering.

作者信息

White Brian S, Shalloway David

机构信息

Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Nov;80(5 Pt 2):056705. doi: 10.1103/PhysRevE.80.056705. Epub 2009 Nov 24.

Abstract

Spectral clustering uses the global information embedded in eigenvectors of an inter-item similarity matrix to correctly identify clusters of irregular shape, an ability lacking in commonly used approaches such as k -means and agglomerative clustering. However, traditional spectral clustering partitions items into hard clusters, and the ability to instead generate fuzzy item assignments would be advantageous for the growing class of domains in which cluster overlap and uncertainty are important. Korenblum and Shalloway [Phys. Rev. E 67, 056704 (2003)] extended spectral clustering to fuzzy clustering by introducing the principle of uncertainty minimization. However, this posed a challenging nonconvex global optimization problem that they solved by a brute-force technique unlikely to scale to data sets having more than O(10;{2}) items. Here we develop a method for solving the minimization problem, which can handle data sets at least two orders of magnitude larger. In doing so, we elucidate the underlying structure of uncertainty minimization using multiple geometric representations. This enables us to show how fuzzy spectral clustering using uncertainty minimization is related to and generalizes clustering motivated by perturbative analysis of almost-block-diagonal matrices. Uncertainty minimization can be applied to a wide variety of existing hard spectral clustering approaches, thus transforming them to fuzzy methods.

摘要

谱聚类利用嵌入在项间相似性矩阵特征向量中的全局信息来正确识别不规则形状的聚类,这是诸如k均值和凝聚聚类等常用方法所缺乏的能力。然而,传统的谱聚类将项划分为硬聚类,而对于聚类重叠和不确定性很重要的不断增长的领域类别而言,生成模糊项分配的能力将更具优势。科伦布卢姆和沙洛维[《物理评论E》67, 056704 (2003)]通过引入不确定性最小化原理将谱聚类扩展到模糊聚类。然而,这带来了一个具有挑战性的非凸全局优化问题,他们通过一种暴力技术解决了该问题,但这种技术不太可能扩展到具有超过O(10²)个项的数据集。在此,我们开发了一种解决最小化问题的方法,该方法可以处理至少比现有数据集大两个数量级的数据集。在此过程中,我们使用多种几何表示阐明了不确定性最小化的底层结构。这使我们能够展示使用不确定性最小化的模糊谱聚类与由几乎块对角矩阵的微扰分析驱动的聚类之间的关系,并对其进行推广。不确定性最小化可以应用于各种各样现有的硬谱聚类方法,从而将它们转化为模糊方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验