Conan-Guez Brieuc, Rossi Fabrice, El Golli Aïcha
LITA EA3097, Université de Metz, Ile du Saulcy, F-57045 Metz, France.
Neural Netw. 2006 Jul-Aug;19(6-7):855-63. doi: 10.1016/j.neunet.2006.05.002. Epub 2006 Jun 12.
In many real-world applications, data cannot be accurately represented by vectors. In those situations, one possible solution is to rely on dissimilarity measures that enable a sensible comparison between observations. Kohonen's self-organizing map (SOM) has been adapted to data described only through their dissimilarity matrix. This algorithm provides both nonlinear projection and clustering of nonvector data. Unfortunately, the algorithm suffers from a high cost that makes it quite difficult to use with voluminous data sets. In this paper, we propose a new algorithm that provides an important reduction in the theoretical cost of the dissimilarity SOM without changing its outcome (the results are exactly the same as those obtained with the original algorithm). Moreover, we introduce implementation methods that result in very short running times. Improvements deduced from the theoretical cost model are validated on simulated and real-world data (a word list clustering problem). We also demonstrate that the proposed implementation methods reduce the running time of the fast algorithm by a factor up to three over a standard implementation.
在许多实际应用中,数据无法用向量准确表示。在这些情况下,一种可能的解决方案是依靠能够对观测值进行合理比较的差异度量。科霍宁的自组织映射(SOM)已被应用于仅通过其差异矩阵描述的数据。该算法提供了非向量数据的非线性投影和聚类。不幸的是,该算法成本高昂,这使得它在处理大量数据集时相当困难。在本文中,我们提出了一种新算法,该算法在不改变其结果(结果与原始算法完全相同)的情况下,大幅降低了差异SOM的理论成本。此外,我们还介绍了能显著缩短运行时间的实现方法。从理论成本模型得出的改进在模拟数据和实际数据(一个单词列表聚类问题)上得到了验证。我们还证明,与标准实现相比,所提出的实现方法将快速算法的运行时间缩短了多达三倍。