Departments of Electrical Engineering and Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel.
Proc Natl Acad Sci U S A. 2013 Nov 5;110(45):18052-7. doi: 10.1073/pnas.1308708110. Epub 2013 Oct 9.
An important tool in information analysis is dimensionality reduction. There are various approaches for large data simplification by scaling its dimensions down that play a significant role in recognition and classification tasks. The efficiency of dimension reduction tools is measured in terms of memory and computational complexity, which are usually a function of the number of the given data points. Sparse local operators that involve substantially less than quadratic complexity at one end, and faithful multiscale models with quadratic cost at the other end, make the design of dimension reduction procedure a delicate balance between modeling accuracy and efficiency. Here, we combine the benefits of both and propose a low-dimensional multiscale modeling of the data, at a modest computational cost. The idea is to project the classical multidimensional scaling problem into the data spectral domain extracted from its Laplace-Beltrami operator. There, embedding into a small dimensional Euclidean space is accomplished while optimizing for a small number of coefficients. We provide a theoretical support and demonstrate that working in the natural eigenspace of the data, one could reduce the process complexity while maintaining the model fidelity. As examples, we efficiently canonize nonrigid shapes by embedding their intrinsic metric into , a method often used for matching and classifying almost isometric articulated objects. Finally, we demonstrate the method by exposing the style in which handwritten digits appear in a large collection of images. We also visualize clustering of digits by treating images as feature points that we map to a plane.
降维是信息分析中的一个重要工具。有各种方法可以通过缩小数据的维度来简化大数据,这些方法在识别和分类任务中起着重要作用。降维工具的效率是根据内存和计算复杂度来衡量的,这些通常是给定数据点数量的函数。稀疏局部算子在一端涉及的复杂度远低于二次方,而在另一端则是忠实的多尺度模型,其成本为二次方,这使得降维过程的设计在建模精度和效率之间达到了微妙的平衡。在这里,我们结合了两者的优点,并提出了一种低维多尺度数据建模方法,其计算成本适中。其思想是将经典多维标度问题投影到从拉普拉斯-贝尔特拉米算子中提取的数据谱域。在那里,通过优化少量系数来完成在小维度欧几里得空间中的嵌入。我们提供了理论支持,并证明了在数据的自然特征空间中工作,可以在保持模型保真度的同时降低处理复杂度。作为示例,我们通过将其内在度量嵌入到其中,有效地规范了非刚性形状,这是一种常用于匹配和分类几乎等距关节物体的方法。最后,我们通过揭示大量图像中手写数字出现的方式来展示该方法。我们还将图像视为特征点,并将其映射到平面上,从而可视化数字的聚类。