Singer Amit, Wu Hau-Tieng
Department of Mathematics and PACM, Princeton University, Fine Hall, Washington Road, Princeton NJ 08544-1000 USA.
Appl Comput Harmon Anal. 2011 Jul;31(1):44-58. doi: 10.1016/j.acha.2010.10.001.
One of the main objectives in the analysis of a high dimensional large data set is to learn its geometric and topological structure. Even though the data itself is parameterized as a point cloud in a high dimensional ambient space ℝ(p), the correlation between parameters often suggests the "manifold assumption" that the data points are distributed on (or near) a low dimensional Riemannian manifold ℳ(d) embedded in ℝ(p), with d ≪ p. We introduce an algorithm that determines the orientability of the intrinsic manifold given a sufficiently large number of sampled data points. If the manifold is orientable, then our algorithm also provides an alternative procedure for computing the eigenfunctions of the Laplacian that are important in the diffusion map framework for reducing the dimensionality of the data. If the manifold is non-orientable, then we provide a modified diffusion mapping of its orientable double covering.
分析高维大数据集的主要目标之一是了解其几何和拓扑结构。尽管数据本身被参数化为高维环境空间ℝ(p)中的点云,但参数之间的相关性通常暗示了“流形假设”,即数据点分布在嵌入于ℝ(p)中的低维黎曼流形ℳ(d)上(或其附近),其中d远小于p。我们引入一种算法,在给定足够数量的采样数据点的情况下确定内在流形的可定向性。如果流形是可定向的,那么我们的算法还提供了一种替代方法来计算拉普拉斯算子的本征函数,这些本征函数在用于降低数据维度的扩散映射框架中很重要。如果流形是不可定向的,那么我们提供其可定向双覆盖的修正扩散映射。