Hughes Serena, Hamilton Timothy, Kolokotrones Tom, Deeds Eric J
Institute for Quantitative and Computational Biosciences, University of California, Los Angeles.
Bioinformatics Interdepartmental Program, University of California, Los Angeles.
ArXiv. 2025 Aug 26:arXiv:2508.19479v1.
Manifold learning builds on the "manifold hypothesis," which posits that data in high-dimensional datasets are drawn from lower-dimensional manifolds. Current tools generate global embeddings of data, rather than the local maps used to define manifolds mathematically. These tools also cannot assess whether the manifold hypothesis holds true for a dataset. Here, we describe DeepAtlas, an algorithm that generates lower-dimensional representations of the data's local neighborhoods, then trains deep neural networks that map between these local embeddings and the original data. Topological distortion is used to determine whether a dataset is drawn from a manifold and, if so, its dimensionality. Application to test datasets indicates that DeepAtlas can successfully learn manifold structures. Interestingly, many real datasets, including single-cell RNA-sequencing, do not conform to the manifold hypothesis. In cases where data is drawn from a manifold, DeepAtlas builds a model that can be used generatively and promises to allow the application of powerful tools from differential geometry to a variety of datasets.
流形学习基于“流形假设”,该假设假定高维数据集中的数据是从低维流形中抽取的。当前的工具生成数据的全局嵌入,而不是用于从数学上定义流形的局部映射。这些工具也无法评估流形假设对于一个数据集是否成立。在这里,我们描述了DeepAtlas,一种算法,它生成数据局部邻域的低维表示,然后训练深度神经网络,在这些局部嵌入和原始数据之间进行映射。拓扑失真用于确定一个数据集是否从流形中抽取,如果是,则确定其维度。应用于测试数据集表明,DeepAtlas可以成功地学习流形结构。有趣的是,许多真实数据集,包括单细胞RNA测序,并不符合流形假设。在数据从流形中抽取的情况下,DeepAtlas构建了一个可以用于生成的模型,并有望允许将微分几何中的强大工具应用于各种数据集。