White Brian S, Shalloway David
Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Nov;80(5 Pt 2):056705. doi: 10.1103/PhysRevE.80.056705. Epub 2009 Nov 24.
Spectral clustering uses the global information embedded in eigenvectors of an inter-item similarity matrix to correctly identify clusters of irregular shape, an ability lacking in commonly used approaches such as k -means and agglomerative clustering. However, traditional spectral clustering partitions items into hard clusters, and the ability to instead generate fuzzy item assignments would be advantageous for the growing class of domains in which cluster overlap and uncertainty are important. Korenblum and Shalloway [Phys. Rev. E 67, 056704 (2003)] extended spectral clustering to fuzzy clustering by introducing the principle of uncertainty minimization. However, this posed a challenging nonconvex global optimization problem that they solved by a brute-force technique unlikely to scale to data sets having more than O(10;{2}) items. Here we develop a method for solving the minimization problem, which can handle data sets at least two orders of magnitude larger. In doing so, we elucidate the underlying structure of uncertainty minimization using multiple geometric representations. This enables us to show how fuzzy spectral clustering using uncertainty minimization is related to and generalizes clustering motivated by perturbative analysis of almost-block-diagonal matrices. Uncertainty minimization can be applied to a wide variety of existing hard spectral clustering approaches, thus transforming them to fuzzy methods.
谱聚类利用嵌入在项间相似性矩阵特征向量中的全局信息来正确识别不规则形状的聚类,这是诸如k均值和凝聚聚类等常用方法所缺乏的能力。然而,传统的谱聚类将项划分为硬聚类,而对于聚类重叠和不确定性很重要的不断增长的领域类别而言,生成模糊项分配的能力将更具优势。科伦布卢姆和沙洛维[《物理评论E》67, 056704 (2003)]通过引入不确定性最小化原理将谱聚类扩展到模糊聚类。然而,这带来了一个具有挑战性的非凸全局优化问题,他们通过一种暴力技术解决了该问题,但这种技术不太可能扩展到具有超过O(10²)个项的数据集。在此,我们开发了一种解决最小化问题的方法,该方法可以处理至少比现有数据集大两个数量级的数据集。在此过程中,我们使用多种几何表示阐明了不确定性最小化的底层结构。这使我们能够展示使用不确定性最小化的模糊谱聚类与由几乎块对角矩阵的微扰分析驱动的聚类之间的关系,并对其进行推广。不确定性最小化可以应用于各种各样现有的硬谱聚类方法,从而将它们转化为模糊方法。