Lause Jan, Berens Philipp, Kobak Dmitry
Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany.
Tübingen AI Center, University of Tübingen, Tübingen, Germany.
PLoS Comput Biol. 2024 Oct 2;20(10):e1012403. doi: 10.1371/journal.pcbi.1012403. eCollection 2024 Oct.
A recent paper claimed that t-SNE and UMAP embeddings of single-cell datasets are "specious" and fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t-SNE and UMAP embeddings of single-cell data do not preserve high-dimensional distances, they can nevertheless provide biologically relevant information.
最近一篇论文声称,单细胞数据集的t-SNE和UMAP嵌入是“似是而非的”,无法捕捉到真正的生物学结构。作者认为,这种嵌入与将数据强制塑造成大象形状一样任意且具有误导性。在这里,我们表明这一结论是基于对嵌入质量的不充分和有限的度量。更合适的量化邻域和类别保留的度量揭示了问题所在:虽然单细胞数据的t-SNE和UMAP嵌入不能保留高维距离,但它们仍然可以提供生物学相关信息。