洞察房间里的大象的艺术：单细胞数据的二维嵌入确实有意义。

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.

作者信息

Lause Jan, Berens Philipp, Kobak Dmitry

机构信息

Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany.

Tübingen AI Center, University of Tübingen, Tübingen, Germany.

出版信息

PLoS Comput Biol. 2024 Oct 2;20(10):e1012403. doi: 10.1371/journal.pcbi.1012403. eCollection 2024 Oct.

DOI:10.1371/journal.pcbi.1012403

PMID:39356722

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11446450/

Abstract

A recent paper claimed that t-SNE and UMAP embeddings of single-cell datasets are "specious" and fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t-SNE and UMAP embeddings of single-cell data do not preserve high-dimensional distances, they can nevertheless provide biologically relevant information.

摘要

最近一篇论文声称，单细胞数据集的t-SNE和UMAP嵌入是“似是而非的”，无法捕捉到真正的生物学结构。作者认为，这种嵌入与将数据强制塑造成大象形状一样任意且具有误导性。在这里，我们表明这一结论是基于对嵌入质量的不充分和有限的度量。更合适的量化邻域和类别保留的度量揭示了问题所在：虽然单细胞数据的t-SNE和UMAP嵌入不能保留高维距离，但它们仍然可以提供生物学相关信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3189/11446450/32b0ec429de3/pcbi.1012403.g001.jpg

相似文献

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.洞察房间里的大象的艺术：单细胞数据的二维嵌入确实有意义。

PLoS Comput Biol. 2024 Oct 2;20(10):e1012403. doi: 10.1371/journal.pcbi.1012403. eCollection 2024 Oct.

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.洞察显而易见之事的艺术：单细胞数据的二维嵌入确实有意义。

bioRxiv. 2024 Jul 31:2024.03.26.586728. doi: 10.1101/2024.03.26.586728.

A generalization of t-SNE and UMAP to single-cell multimodal omics.单细胞多模态组学中 t-SNE 和 UMAP 的推广

Genome Biol. 2021 May 3;22(1):130. doi: 10.1186/s13059-021-02356-5.

Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.用于检测可疑的 2D 单细胞嵌入并优化 t-SNE 和 UMAP 参数的统计方法 scDEED。

Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.

scDEED: a statistical method for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.scDEED：一种用于检测可疑二维单细胞嵌入并优化t-SNE和UMAP超参数的统计方法。

bioRxiv. 2023 Sep 15:2023.04.21.537839. doi: 10.1101/2023.04.21.537839.

Dimensionality Reduction of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的降维处理。

Methods Mol Biol. 2021;2284:331-342. doi: 10.1007/978-1-0716-1307-8_18.

PARE: A framework for removal of confounding effects from any distance-based dimension reduction method.PARE：一种从任何基于距离的降维方法中去除混杂效应的框架。

PLoS Comput Biol. 2024 Jul 10;20(7):e1012241. doi: 10.1371/journal.pcbi.1012241. eCollection 2024 Jul.

Predicting User Preferences of Dimensionality Reduction Embedding Quality.预测降维嵌入质量的用户偏好。

IEEE Trans Vis Comput Graph. 2023 Jan;29(1):745-755. doi: 10.1109/TVCG.2022.3209449. Epub 2022 Dec 16.

Clustering and visualization of single-cell RNA-seq data using path metrics.基于路径测度的单细胞 RNA-seq 数据聚类和可视化。

PLoS Comput Biol. 2024 May 29;20(5):e1012014. doi: 10.1371/journal.pcbi.1012014. eCollection 2024 May.

Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data.基于快速插值的 t-SNE 用于改善单细胞 RNA-seq 数据的可视化。

Nat Methods. 2019 Mar;16(3):243-245. doi: 10.1038/s41592-018-0308-4. Epub 2019 Feb 11.

引用本文的文献

shinyUMAP: an online tool for promoting understanding of single cell omics data visualization.闪亮UMAP：一个促进对单细胞组学数据可视化理解的在线工具。

bioRxiv. 2025 Sep 1:2025.08.27.672621. doi: 10.1101/2025.08.27.672621.

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.使用广义双线性模型对单细胞RNA测序进行基于模型的降维

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxaf024.

Sketching T cell atlases in the single-cell era: challenges and recommendations.单细胞时代绘制T细胞图谱：挑战与建议

Immunol Cell Biol. 2025 Aug;103(7):723-737. doi: 10.1111/imcb.70040. Epub 2025 Jun 29.

Understanding rheumatic disease through continuous cell state analysis.通过连续细胞状态分析了解风湿性疾病。

Nat Rev Rheumatol. 2025 May 7. doi: 10.1038/s41584-025-01253-6.

A Hands-On Introduction to Data Analytics for Biomedical Research.生物医学研究数据分析实践入门

Function (Oxf). 2025 Mar 24;6(2). doi: 10.1093/function/zqaf015.

Two-by-two ordinal patterns in art paintings.艺术画作中的二乘二有序模式。

PNAS Nexus. 2025 Mar 18;4(3):pgaf092. doi: 10.1093/pnasnexus/pgaf092. eCollection 2025 Mar.

Epithelial-mesenchymal transition couples with cell cycle arrest at various stages.上皮-间质转化在各个阶段与细胞周期停滞相关联。

bioRxiv. 2025 Feb 28:2025.02.24.639880. doi: 10.1101/2025.02.24.639880.

Principled PCA separates signal from noise in omics count data.基于原理的主成分分析（PCA）可在组学计数数据中分离信号与噪声。

bioRxiv. 2025 Feb 7:2025.02.03.636129. doi: 10.1101/2025.02.03.636129.

Spatial transcriptomic clocks reveal cell proximity effects in brain ageing.空间转录组时钟揭示大脑衰老中的细胞邻近效应。

Nature. 2025 Feb;638(8049):160-171. doi: 10.1038/s41586-024-08334-8. Epub 2024 Dec 18.

Combined LC-MS/MS feature grouping, statistical prioritization, and interactive networking in msFeaST.msFeaST 中结合了 LC-MS/MS 特征分组、统计优先级排序和交互式网络。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae584.

本文引用的文献

What cannot be seen correctly in 2D visualizations of single-cell 'omics data?单细胞“组学”数据的 2D 可视化无法正确显示什么？

Cell Syst. 2023 Sep 20;14(9):723-731. doi: 10.1016/j.cels.2023.07.002.

The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。

PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.

Comparative analysis of dimension reduction methods for cytometry by time-of-flight data.流式细胞术飞行时间数据降维方法的比较分析。

Nat Commun. 2023 Apr 1;14(1):1836. doi: 10.1038/s41467-023-37478-w.

Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization.面向转录组数据可视化的降维方法综合评估。

Commun Biol. 2022 Jul 19;5(1):719. doi: 10.1038/s42003-022-03628-x.

The art of using t-SNE for single-cell transcriptomics.使用 t-SNE 进行单细胞转录组学分析的艺术。

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

Toward a Quantitative Survey of Dimension Reduction Techniques.迈向降维技术的定量调查。

IEEE Trans Vis Comput Graph. 2021 Mar;27(3):2153-2173. doi: 10.1109/TVCG.2019.2944182. Epub 2021 Jan 28.

Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。

Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.

Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment.用于视觉分析的多维投影：将技术与失真、任务及布局丰富化相联系

IEEE Trans Vis Comput Graph. 2019 Aug;25(8):2650-2673. doi: 10.1109/TVCG.2018.2846735. Epub 2018 Jun 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

洞察房间里的大象的艺术：单细胞数据的二维嵌入确实有意义。

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献