Kohli Dhruv, Nieuwenhuis Johannes S, Cloninger Alexander, Mishne Gal, Narain Devika
bioRxiv. 2024 Oct 31:2024.10.31.621292. doi: 10.1101/2024.10.31.621292.
With the ubiquity of high-dimensional datasets in various biological fields, identifying low-dimensional topological manifolds within such datasets may reveal principles connecting latent variables to measurable instances in the world. The reliable discovery of such manifold structure in high-dimensional datasets can prove challenging, however, largely due to the introduction of distortion by leading manifold learning methods. The problem is further exacerbated by the lack of consensus on how to evaluate the quality of the recovered manifolds. Here, we present a novel measure of distortion to evaluate low-dimensional representations obtained using different techniques. We additionally develop a novel bottom-up manifold learning technique called Riemannian Alignment of Tangent Spaces (RATS) that aims to recover low-distortion embeddings of data, including the ability to embed closed manifolds into their intrinsic dimension using a unique tearing process. Compared to previous methods, we show that RATS provides low-distortion embeddings that excel in the visualization and deciphering of latent variables across a range of idealized, biological, and surrogate datasets that mimic real-world data.
随着高维数据集在各个生物学领域的广泛存在,在这些数据集中识别低维拓扑流形可能会揭示潜在变量与世界中可测量实例之间的联系原则。然而,在高维数据集中可靠地发现这种流形结构可能具有挑战性,这主要是由于领先的流形学习方法引入了失真。由于在如何评估恢复的流形质量方面缺乏共识,这个问题进一步恶化。在这里,我们提出了一种新的失真度量来评估使用不同技术获得的低维表示。我们还开发了一种名为切空间黎曼对齐(RATS)的新型自底向上流形学习技术,旨在恢复低失真的数据嵌入,包括使用独特的撕裂过程将封闭流形嵌入其固有维度的能力。与以前的方法相比,我们表明RATS提供了低失真嵌入,在一系列理想化、生物学和模拟真实世界数据的替代数据集上,在潜在变量的可视化和解密方面表现出色。