University of Bielefeld, 33501 Bielefeld, Germany.
Neural Comput. 2010 Sep 1;22(9):2229-84. doi: 10.1162/NECO_a_00012.
Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for example, browsing, become available. Both methods have been introduced for vectorial data sets; they require a classical feature encoding of information. Often data are available in the form of pairwise distances only, such as arise from a kernel matrix, a graph, or some general dissimilarity measure. In such cases, NG and SOM cannot be applied directly. In this article, we introduce relational topographic maps as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure. These methods are equivalent to the standard (vectorial) techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding. Extending these techniques for the general case of non-Euclidean dissimilarities makes possible an interpretation of relational clustering as clustering in pseudo-Euclidean space. We compare the methods to well-known clustering methods for proximity data based on deterministic annealing and discuss how far convergence can be guaranteed in the general case. Relational clustering is quadratic in the number of data points, which makes the algorithms infeasible for huge data sets. We propose an approximate patch version of relational clustering that runs in linear time. The effectiveness of the methods is demonstrated in a number of examples.
地形地图,如自组织映射(SOM)或神经气体(NG),构成了强大的数据挖掘技术,允许同时对数据进行聚类并推断其拓扑结构,从而提供了其他功能,例如浏览。这两种方法都已被引入到向量数据集;它们需要对信息进行经典的特征编码。通常,数据仅以成对距离的形式出现,例如核矩阵、图或某些一般相似度度量中出现的距离。在这种情况下,NG 和 SOM 不能直接应用。在本文中,我们引入了关系地形图作为关系聚类算法的扩展,该算法为相似性数据提供基于原型的表示形式,以合并邻域结构。如果存在欧几里得嵌入,则这些方法等同于标准(向量)技术,同时避免了显式计算此类嵌入的需要。将这些技术扩展到非欧几里得相似度的一般情况,使得关系聚类可以解释为伪欧几里得空间中的聚类。我们将这些方法与基于确定性退火的基于相似性数据的知名聚类方法进行比较,并讨论在一般情况下可以保证多远的收敛性。关系聚类在数据点的数量上是二次的,这使得算法对于庞大的数据集不可行。我们提出了一种关系聚类的近似补丁版本,其运行时间为线性。在许多示例中证明了这些方法的有效性。