Dyballa Luciano, Zucker Steven W
Department of Computer Science, Yale University, New Haven, CT 06511, U.S.A.
Departments of Computer Science and of Biomedical Engineering, Yale University, New Haven, CT 06511, U.S.A.
Neural Comput. 2023 Feb 17;35(3):453-524. doi: 10.1162/neco_a_01566.
Invoking the manifold assumption in machine learning requires knowledge of the manifold's geometry and dimension, and theory dictates how many samples are required. However, in most applications, the data are limited, sampling may not be uniform, and the manifold's properties are unknown; this implies that neighborhoods must adapt to the local structure. We introduce an algorithm for inferring adaptive neighborhoods for data given by a similarity kernel. Starting with a locally conservative neighborhood (Gabriel) graph, we sparsify it iteratively according to a weighted counterpart. In each step, a linear program yields minimal neighborhoods globally, and a volumetric statistic reveals neighbor outliers likely to violate manifold geometry. We apply our adaptive neighborhoods to nonlinear dimensionality reduction, geodesic computation, and dimension estimation. A comparison against standard algorithms using, for example, k-nearest neighbors, demonstrates the usefulness of our approach.
在机器学习中应用流形假设需要了解流形的几何形状和维度,并且理论规定了所需的样本数量。然而,在大多数应用中,数据是有限的,采样可能不均匀,并且流形的属性是未知的;这意味着邻域必须适应局部结构。我们介绍一种用于为由相似性核给出的数据推断自适应邻域的算法。从局部保守邻域(加布里埃尔)图开始,我们根据加权对应图对其进行迭代稀疏化。在每一步中,一个线性规划全局地产生最小邻域,并且一个体积统计揭示可能违反流形几何形状的邻域离群值。我们将自适应邻域应用于非线性降维、测地线计算和维度估计。与使用例如k近邻的标准算法进行比较,证明了我们方法的有效性。