Haese K, Goodhill G J
Data Warehouse/Data Mining, Mummert & Partners Management Consulting, Braunschweig, D-38104, Germany.
Neural Comput. 2001 Mar;13(3):595-619. doi: 10.1162/089976601300014475.
An important technique for exploratory data analysis is to form a mapping from the high-dimensional data space to a low-dimensional representation space such that neighborhoods are preserved. A popular method for achieving this is Kohonen's self-organizing map (SOM) algorithm. However, in its original form, this requires the user to choose the values of several parameters heuristically to achieve good performance. Here we present the Auto-SOM, an algorithm that estimates the learning parameters during the training of SOMs automatically. The application of Auto-SOM provides the facility to avoid neighborhood violations up to a user-defined degree in either mapping direction. Auto-SOM consists of a Kalman filter implementation of the SOM coupled with a recursive parameter estimation method. The Kalman filter trains the neurons' weights with estimated learning coefficients so as to minimize the variance of the estimation error. The recursive parameter estimation method estimates the width of the neighborhood function by minimizing the prediction error variance of the Kalman filter. In addition, the "topographic function" is incorporated to measure neighborhood violations and prevent the map's converging to configurations with neighborhood violations. It is demonstrated that neighborhoods can be preserved in both mapping directions as desired for dimension-reducing applications. The development of neighborhood-preserving maps and their convergence behavior is demonstrated by three examples accounting for the basic applications of self-organizing feature maps.
探索性数据分析的一项重要技术是构建从高维数据空间到低维表示空间的映射,以便保留邻域关系。实现这一目标的一种常用方法是科霍宁自组织映射(SOM)算法。然而,其原始形式要求用户凭经验选择几个参数的值以实现良好性能。在此,我们提出自动SOM算法,这是一种在SOM训练过程中自动估计学习参数的算法。自动SOM的应用提供了一种机制,可在任意映射方向上避免邻域关系违反达到用户定义的程度。自动SOM由SOM的卡尔曼滤波器实现与递归参数估计方法组成。卡尔曼滤波器使用估计的学习系数训练神经元的权重,以最小化估计误差的方差。递归参数估计方法通过最小化卡尔曼滤波器的预测误差方差来估计邻域函数的宽度。此外,引入“拓扑函数”来衡量邻域关系的违反情况,并防止映射收敛到存在邻域关系违反的配置。结果表明,对于降维应用,可以按期望在两个映射方向上保留邻域关系。通过三个说明自组织特征映射基本应用的示例展示了保留邻域关系映射的发展及其收敛行为。