• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估和提高邻域嵌入方法的可靠性:地图连续性视角

Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective.

作者信息

Liu Zhexuan, Ma Rong, Zhong Yiqiao

机构信息

Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA.

Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA.

出版信息

Nat Commun. 2025 May 30;16(1):5037. doi: 10.1038/s41467-025-60434-9.

DOI:10.1038/s41467-025-60434-9
PMID:40447630
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12125374/
Abstract

Visualizing high-dimensional data is essential for understanding biomedical data and deep learning models. Neighbor embedding methods, such as t-SNE and UMAP, are widely used but can introduce misleading visual artifacts. We find that the manifold learning interpretations from many prior works are inaccurate and that the misuse stems from a lack of data-independent notions of embedding maps, which project high-dimensional data into a lower-dimensional space. Leveraging the leave-one-out principle, we introduce LOO-map, a framework that extends embedding maps beyond discrete points to the entire input space. We identify two forms of map discontinuity that distort visualizations: one exaggerates cluster separation and the other creates spurious local structures. As a remedy, we develop two types of point-wise diagnostic scores to detect unreliable embedding points and improve hyperparameter selection, which are validated on datasets from computer vision and single-cell omics.

摘要

可视化高维数据对于理解生物医学数据和深度学习模型至关重要。诸如t-SNE和UMAP等邻域嵌入方法被广泛使用,但可能会引入误导性的视觉伪影。我们发现,许多先前工作中的流形学习解释是不准确的,并且这种误用源于缺乏将高维数据投影到低维空间的嵌入映射的与数据无关的概念。利用留一法原则,我们引入了LOO-map,这是一个将嵌入映射从离散点扩展到整个输入空间的框架。我们识别出两种会扭曲可视化的映射不连续性形式:一种会夸大聚类分离,另一种会创建虚假的局部结构。作为一种补救措施,我们开发了两种逐点诊断分数来检测不可靠的嵌入点并改进超参数选择,这些分数在来自计算机视觉和单细胞组学的数据集上得到了验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/6001fd0e2f84/41467_2025_60434_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/ddb3e19297e6/41467_2025_60434_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/4819ee943530/41467_2025_60434_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/476b0bd6598c/41467_2025_60434_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/d8d2975293f6/41467_2025_60434_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/54b9db2bd62d/41467_2025_60434_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/1d7cdbd7b9f9/41467_2025_60434_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/b2d2be6a10f2/41467_2025_60434_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/6001fd0e2f84/41467_2025_60434_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/ddb3e19297e6/41467_2025_60434_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/4819ee943530/41467_2025_60434_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/476b0bd6598c/41467_2025_60434_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/d8d2975293f6/41467_2025_60434_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/54b9db2bd62d/41467_2025_60434_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/1d7cdbd7b9f9/41467_2025_60434_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/b2d2be6a10f2/41467_2025_60434_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a92/12125374/6001fd0e2f84/41467_2025_60434_Fig8_HTML.jpg

相似文献

1
Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective.评估和提高邻域嵌入方法的可靠性:地图连续性视角
Nat Commun. 2025 May 30;16(1):5037. doi: 10.1038/s41467-025-60434-9.
2
DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.DGCyTOF:基于图形聚类可视化的深度学习,用于预测单细胞质谱流式细胞术数据的细胞类型。
PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.
3
Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.用于检测可疑的 2D 单细胞嵌入并优化 t-SNE 和 UMAP 参数的统计方法 scDEED。
Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.
4
Deep Recursive Embedding for High-Dimensional Data.用于高维数据的深度递归嵌入
IEEE Trans Vis Comput Graph. 2022 Feb;28(2):1237-1248. doi: 10.1109/TVCG.2021.3122388. Epub 2021 Dec 30.
5
Visualizing single-cell data with the neighbor embedding spectrum.利用邻域嵌入谱可视化单细胞数据。
bioRxiv. 2024 Apr 29:2024.04.26.590867. doi: 10.1101/2024.04.26.590867.
6
Shape-aware stochastic neighbor embedding for robust data visualisations.形状感知随机近邻嵌入的稳健数据可视化。
BMC Bioinformatics. 2022 Nov 14;23(1):477. doi: 10.1186/s12859-022-05028-8.
7
Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.UMAP 通过降维增强了批量转录组数据中样本异质性分析。
Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.
8
Embedding Functional Brain Networks in Low Dimensional Spaces Using Manifold Learning Techniques.使用流形学习技术将功能性脑网络嵌入低维空间
Front Neuroinform. 2021 Dec 24;15:740143. doi: 10.3389/fninf.2021.740143. eCollection 2021.
9
Assessing single-cell transcriptomic variability through density-preserving data visualization.通过保持密度的数据可视化来评估单细胞转录组的变异性。
Nat Biotechnol. 2021 Jun;39(6):765-774. doi: 10.1038/s41587-020-00801-7. Epub 2021 Jan 18.
10
Capturing discrete latent structures: choose LDs over PCs.捕捉离散潜在结构:选择潜在因子而非主成分。
Biostatistics. 2022 Dec 12;24(1):1-16. doi: 10.1093/biostatistics/kxab030.

本文引用的文献

1
Diffusive topology preserving manifold distances for single-cell data analysis.用于单细胞数据分析的扩散拓扑保持流形距离
Proc Natl Acad Sci U S A. 2025 Jan 28;122(4):e2404860121. doi: 10.1073/pnas.2404860121. Epub 2025 Jan 24.
2
Seeing data as t-SNE and UMAP do.如同t-SNE和UMAP那样查看数据。
Nat Methods. 2024 Jun;21(6):930-933. doi: 10.1038/s41592-024-02301-x.
3
Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.用于检测可疑的 2D 单细胞嵌入并优化 t-SNE 和 UMAP 参数的统计方法 scDEED。
Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.
4
Genomic data in the All of Us Research Program.全美国研究计划中的基因组数据。
Nature. 2024 Mar;627(8003):340-346. doi: 10.1038/s41586-023-06957-x. Epub 2024 Feb 19.
5
Dynamic visualization of high-dimensional data.高维数据的动态可视化。
Nat Comput Sci. 2023 Jan;3(1):86-100. doi: 10.1038/s43588-022-00380-4. Epub 2022 Dec 30.
6
Revealing hidden patterns in deep neural network feature space continuum via manifold learning.通过流形学习揭示深度神经网络特征空间连续体中的隐藏模式。
Nat Commun. 2023 Dec 21;14(1):8506. doi: 10.1038/s41467-023-43958-w.
7
The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。
PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.
8
layerUMAP: A tool for visualizing and understanding deep learning models in biological sequence classification using UMAP.layerUMAP:一种使用UMAP对生物序列分类中的深度学习模型进行可视化和理解的工具。
iScience. 2022 Nov 7;25(12):105530. doi: 10.1016/j.isci.2022.105530. eCollection 2022 Dec 22.
9
Using Global t-SNE to Preserve Intercluster Data Structure.使用全局 t-SNE 保持簇间数据结构。
Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.
10
EMBEDR: Distinguishing signal from noise in single-cell omics data.EMBEDR:在单细胞组学数据中区分信号与噪声。
Patterns (N Y). 2022 Feb 8;3(3):100443. doi: 10.1016/j.patter.2022.100443. eCollection 2022 Mar 11.