Yang Li
Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008-5466, USA.
IEEE Trans Pattern Anal Mach Intell. 2008 Mar;30(3):438-50. doi: 10.1109/TPAMI.2007.70706.
Data observations that lie on a manifold can be approximated by a collection of overlapping local patches, the alignment of which in a low dimensional Euclidean space provides an embedding of the data. This paper describes an embedding method using classical multidimensional scaling as a local model based on the fact that a manifold locally resembles an Euclidean space. A set of overlapping neighborhoods are chosen by a greedy approximation algorithm of minimum set cover. Local patches derived from the set of overlapping neighborhoods by classical multidimensional scaling are aligned in order to minimize a residual measure, which has a quadratic form of the resulting global coordinates and can be minimized analytically by solving an eigenvalue problem. This method requires only distances within each neighborhood and provides locally isometric embedding results. The size of the eigenvalue problem scales with the number of overlapping neighborhoods rather than the number of data points. Experiments on both synthetic and real world data sets demonstrate the effectiveness of this method. Extensions and variations of the method are discussed.
位于流形上的数据观测值可以通过一组重叠的局部补丁来近似,这些补丁在低维欧几里得空间中的对齐方式提供了数据的嵌入。本文基于流形在局部类似于欧几里得空间这一事实,描述了一种使用经典多维缩放作为局部模型的嵌入方法。通过最小集覆盖的贪心近似算法选择一组重叠邻域。通过经典多维缩放从重叠邻域集导出的局部补丁进行对齐,以最小化一个残差度量,该残差度量是所得全局坐标的二次形式,并且可以通过求解特征值问题进行解析最小化。该方法仅需要每个邻域内的距离,并提供局部等距嵌入结果。特征值问题的规模与重叠邻域的数量成比例,而不是与数据点的数量成比例。在合成数据集和真实世界数据集上的实验证明了该方法的有效性。还讨论了该方法的扩展和变体。