Lause Jan, Kobak Dmitry, Berens Philipp
bioRxiv. 2024 Jul 31:2024.03.26.586728. doi: 10.1101/2024.03.26.586728.
A recent paper in (Chari and Pachter, 2023) claimed that -SNE and UMAP embeddings of single-cell datasets fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while -SNE and UMAP embeddings of single-cell data do not preserve high-dimensional distances, they can nevertheless provide biologically relevant information.
最近发表在(查里和帕奇特,2023年)的一篇论文声称,单细胞数据集的t-SNE和UMAP嵌入无法捕捉到真正的生物学结构。作者认为,这种嵌入与将数据强制塑造成大象形状一样任意且具有误导性。在这里,我们表明这一结论是基于对嵌入质量的不充分和有限的度量。更合适的量化邻域和类别保留的度量揭示了问题所在:虽然单细胞数据的t-SNE和UMAP嵌入不能保留高维距离,但它们仍然可以提供生物学相关信息。