用于表示和半监督学习的参数化均匀流形近似投影嵌入

Parametric UMAP Embeddings for Representation and Semisupervised Learning.

作者信息

Sainburg Tim, McInnes Leland, Gentner Timothy Q

机构信息

University of California San Diego, La Jolla, CA 92093, U.S.A.

Tutte Institute for Mathematics and Computing, Ottawa, Ontario Canada

出版信息

Neural Comput. 2021 Oct 12;33(11):2881-2907. doi: 10.1162/neco_a_01434.

UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.1.

UMAP是一种基于非参数图的降维算法，它运用应用黎曼几何和代数拓扑来寻找结构化数据的低维嵌入。UMAP算法由两个步骤组成：（1）计算数据集的图形表示（模糊单纯复形），以及（2）通过随机梯度下降，优化该图的低维嵌入。在此，我们将UMAP的第二步扩展为对神经网络权重的参数优化，学习数据与嵌入之间的参数关系。我们首先证明，参数化UMAP与其非参数对应物表现相当，同时具有学习到的参数映射的优势（例如，为新数据进行快速在线嵌入）。然后，我们将UMAP探索为一种正则化方法，通过捕获未标记数据中的结构来约束自动编码器的潜在分布，参数化地改变全局结构保留，并提高半监督学习的分类器准确性。1.

Parametric UMAP Embeddings for Representation and Semisupervised Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献