Sainburg Tim, McInnes Leland, Gentner Timothy Q
University of California San Diego, La Jolla, CA 92093, U.S.A.
Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada
Neural Comput. 2021 Oct 12;33(11):2881-2907. doi: 10.1162/neco_a_01434.
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1.
UMAP是一种基于非参数图的降维算法,它运用应用黎曼几何和代数拓扑来寻找结构化数据的低维嵌入。UMAP算法由两个步骤组成:(1)计算数据集的图形表示(模糊单纯复形),以及(2)通过随机梯度下降,优化该图的低维嵌入。在此,我们将UMAP的第二步扩展为对神经网络权重的参数优化,学习数据与嵌入之间的参数关系。我们首先证明,参数化UMAP与其非参数对应物表现相当,同时具有学习到的参数映射的优势(例如,为新数据进行快速在线嵌入)。然后,我们将UMAP探索为一种正则化方法,通过捕获未标记数据中的结构来约束自动编码器的潜在分布,参数化地改变全局结构保留,并提高半监督学习的分类器准确性。1.