Lespinats Sylvain, Verleysen Michel, Giron Alain, Fertil Bernard
UMR INSERM, unité 678-Université Pierre et Marie Curie--Paris 6, 75634 Paris, France.
IEEE Trans Neural Netw. 2007 Sep;18(5):1265-79. doi: 10.1109/tnn.2007.891682.
Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This paper presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. It improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces (1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and (2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to high-dimensional data and the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction methods.
例如,为了可视化而在低维空间中映射高维数据,是数据分析中一个日益受到关注的主要问题。本文提出了数据驱动的高维缩放(DD-HDS),这是一种基于多维缩放(MDS)方法的数据驱动的非线性映射方法,它基于数据对之间距离的保留。它通过两种方式提高了现有竞争对手在高维数据表示方面的性能。它引入了(1)考虑测量现象集中度的数据间距离的特定加权,以及(2)对原始空间和输出空间中短距离的对称处理,避免了虚假邻居表示,同时仍允许原始分布中有一些必要的撕裂。更准确地说,加权是根据数据集中距离的有效分布设置的,除了一个用户定义的参数用于设置局部邻域保留和全局映射之间的权衡。为映射设计的应力准则的优化通过“力导向放置”(FDP)实现。给出了低维和高维数据集的映射,以说明所提出算法的特点和优点。特定于高维数据的加权函数和短距离的对称处理可以很容易地纳入大多数基于距离保留的非线性降维方法中。