使用全局 t-SNE 保持簇间数据结构。

Using Global t-SNE to Preserve Intercluster Data Structure.

机构信息

Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A.

Division of Biological Sciences, University of California San Diego, La Jolla, CA 92037, U.S.A.

出版信息

Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.

DOI:10.1162/neco_a_01504

PMID:35798323

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10010455/

Abstract

The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales.

摘要

t 分布随机邻嵌入（t-SNE）方法是数据可视化和聚类的领先技术之一。该方法在最小化邻域数据点之间距离失真的同时，找到数据点的低维嵌入。通过构造，t-SNE 丢弃了数据大规模结构的信息。我们表明，在 t-SNE 成本函数中添加全局成本函数使得在保留全局聚类间数据结构的同时对数据进行聚类成为可能。我们在一个合成数据集和两个关于花形状和人类脑细胞的真实数据集上测试了新的全局 t-SNE（g-SNE）方法。我们发现，在植物和人类大脑数据集都存在显著且有意义的全局结构。在所有情况下，g-SNE 在保留全局结构方面都优于 t-SNE 和 UMAP。聚类结果的拓扑分析使得可以在不同尺度上的数据分布之间找到一个合适的权衡。我们发现，作为人类大脑数据集一部分的两个对象之间在数据如何在不同尺度上分布方面存在差异。因此，通过努力实现聚类的准确性和聚类之间的定位，g-SNE 方法可以识别数据跨尺度组织的新方面。

相似文献

Using Global t-SNE to Preserve Intercluster Data Structure.使用全局 t-SNE 保持簇间数据结构。

Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.

Shape-aware stochastic neighbor embedding for robust data visualisations.形状感知随机近邻嵌入的稳健数据可视化。

BMC Bioinformatics. 2022 Nov 14;23(1):477. doi: 10.1186/s12859-022-05028-8.

Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data.监督容量保持映射：一种基于聚类的 scRNA-seq 数据可视化方法。

Bioinformatics. 2022 Apr 28;38(9):2496-2503. doi: 10.1093/bioinformatics/btac131.

Self-Organizing Nebulous Growths for Robust and Incremental Data Visualization.用于稳健且增量式数据可视化的自组织星云状生长

IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4588-4602. doi: 10.1109/TNNLS.2020.3023941. Epub 2021 Oct 5.

DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.DGCyTOF：基于图形聚类可视化的深度学习，用于预测单细胞质谱流式细胞术数据的细胞类型。

PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.

Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.高维多模态神经影像数据的可视化与无监督预测聚类

J Neurosci Methods. 2014 Oct 30;236:19-25. doi: 10.1016/j.jneumeth.2014.08.001. Epub 2014 Aug 10.

Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.UMAP 通过降维增强了批量转录组数据中样本异质性分析。

Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis.基于半监督主成分分析的单细胞 RNA-seq 数据可视化

Int J Mol Sci. 2020 Aug 12;21(16):5797. doi: 10.3390/ijms21165797.

A Tool for Interactive Data Visualization: Application to Over 10,000 Brain Imaging and Phantom MRI Data Sets.一种交互式数据可视化工具：应用于一万多个脑成像和体模磁共振成像数据集。

Front Neuroinform. 2016 Mar 15;10:9. doi: 10.3389/fninf.2016.00009. eCollection 2016.

Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.用于检测可疑的 2D 单细胞嵌入并优化 t-SNE 和 UMAP 参数的统计方法 scDEED。

Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.

引用本文的文献

Dissecting Morphological and Functional Dynamics of Non-Tumorigenic and Triple-Negative Breast Cancer Cell Lines Using PCA and t-SNE Analysis.使用主成分分析（PCA）和t-分布随机邻域嵌入（t-SNE）分析剖析非致瘤性和三阴性乳腺癌细胞系的形态学和功能动力学

Cancer Rep (Hoboken). 2025 Jul;8(7):e70257. doi: 10.1002/cnr2.70257.

Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective.评估和提高邻域嵌入方法的可靠性：地图连续性视角

Nat Commun. 2025 May 30;16(1):5037. doi: 10.1038/s41467-025-60434-9.

In vitro Validation of a Novel Disposable Remover to Remove Activated Leukocytes Generated During Cardiopulmonary Bypass: A Pilot Study.一种新型一次性去除装置用于去除体外循环期间产生的活化白细胞的体外验证：一项初步研究。

J Inflamm Res. 2025 Apr 18;18:5355-5370. doi: 10.2147/JIR.S503575. eCollection 2025.

Assessing the Causal Relationship Between Plasma Proteins and Pulmonary Fibrosis: A Systematic Analysis Based on Mendelian Randomization.评估血浆蛋白与肺纤维化之间的因果关系：基于孟德尔随机化的系统分析

Biology (Basel). 2025 Feb 14;14(2):200. doi: 10.3390/biology14020200.

Dimensionality reduction for visualizing spatially resolved profiling data using SpaSNE.使用SpaSNE进行降维以可视化空间分辨的分析数据。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf002.

Enhancing chemical synthesis research with NLP: Word embeddings for chemical reagent identification-A case study on nano-FeCu.利用自然语言处理技术加强化学合成研究：用于化学试剂识别的词嵌入——以纳米铁铜为例

iScience. 2024 Aug 29;27(10):110780. doi: 10.1016/j.isci.2024.110780. eCollection 2024 Oct 18.

Food Chemicals and Epigenetic Targets: An Epi Food Chemical Database.食品化学物质与表观遗传靶点：一个表观遗传食品化学物质数据库。

ACS Omega. 2024 May 29;9(23):25322-25331. doi: 10.1021/acsomega.4c03321. eCollection 2024 Jun 11.

AI-Enhanced evaluation of YouTube content on post-surgical incontinence following pelvic cancer treatment.人工智能辅助评估YouTube上关于盆腔癌治疗后手术失禁的内容。

SSM Popul Health. 2024 May 4;26:101677. doi: 10.1016/j.ssmph.2024.101677. eCollection 2024 Jun.

Imaging-based chromatin and epigenetic age, ImAge, quantitates aging and rejuvenation.基于成像的染色质和表观遗传年龄（ImAge）可量化衰老和年轻化。

Res Sq. 2023 Nov 7:rs.3.rs-3479973. doi: 10.21203/rs.3.rs-3479973/v1.

Single-cell analysis technologies for cancer research: from tumor-specific single cell discovery to cancer therapy.用于癌症研究的单细胞分析技术：从肿瘤特异性单细胞发现到癌症治疗

Front Genet. 2023 Oct 12;14:1276959. doi: 10.3389/fgene.2023.1276959. eCollection 2023.

本文引用的文献

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.自动优化的 T 分布随机近邻嵌入参数可改善大数据集的可视化和分析。

Nat Commun. 2019 Nov 28;10(1):5415. doi: 10.1038/s41467-019-13055-y.

The art of using t-SNE for single-cell transcriptomics.使用 t-SNE 进行单细胞转录组学分析的艺术。

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data.基于快速插值的 t-SNE 用于改善单细胞 RNA-seq 数据的可视化。

Nat Methods. 2019 Mar;16(3):243-245. doi: 10.1038/s41592-018-0308-4. Epub 2019 Feb 11.

Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。

Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.

Visualizing and Interpreting Single-Cell Gene Expression Datasets with Similarity Weighted Nonnegative Embedding.基于相似性加权非负嵌入的单细胞基因表达数据集可视化与解释

Cell Syst. 2018 Dec 26;7(6):656-666.e4. doi: 10.1016/j.cels.2018.10.015. Epub 2018 Dec 5.

Hyperbolic geometry of the olfactory space.嗅觉空间的双曲几何。

Sci Adv. 2018 Aug 29;4(8):eaaq1458. doi: 10.1126/sciadv.aaq1458. eCollection 2018 Aug.

Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.基于深度生成模型的单细胞转录组数据的可解释维度约简。

Nat Commun. 2018 May 21;9(1):2002. doi: 10.1038/s41467-018-04368-5.

Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain.脊椎动物大脑谱系和细胞类型的同时单细胞分析。

Nat Biotechnol. 2018 Jun;36(5):442-450. doi: 10.1038/nbt.4103. Epub 2018 Mar 28.

Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells.单细胞RNA测序揭示造血细胞分化的连续谱系

Cell Rep. 2016 Feb 2;14(4):966-977. doi: 10.1016/j.celrep.2015.12.082. Epub 2016 Jan 21.

Clique topology reveals intrinsic geometric structure in neural correlations.团拓扑揭示了神经相关性中的内在几何结构。

Proc Natl Acad Sci U S A. 2015 Nov 3;112(44):13455-60. doi: 10.1073/pnas.1506407112. Epub 2015 Oct 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。