Suppr超能文献

使用全局 t-SNE 保持簇间数据结构。

Using Global t-SNE to Preserve Intercluster Data Structure.

机构信息

Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A.

Division of Biological Sciences, University of California San Diego, La Jolla, CA 92037, U.S.A.

出版信息

Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.

Abstract

The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales.

摘要

t 分布随机邻嵌入(t-SNE)方法是数据可视化和聚类的领先技术之一。该方法在最小化邻域数据点之间距离失真的同时,找到数据点的低维嵌入。通过构造,t-SNE 丢弃了数据大规模结构的信息。我们表明,在 t-SNE 成本函数中添加全局成本函数使得在保留全局聚类间数据结构的同时对数据进行聚类成为可能。我们在一个合成数据集和两个关于花形状和人类脑细胞的真实数据集上测试了新的全局 t-SNE(g-SNE)方法。我们发现,在植物和人类大脑数据集都存在显著且有意义的全局结构。在所有情况下,g-SNE 在保留全局结构方面都优于 t-SNE 和 UMAP。聚类结果的拓扑分析使得可以在不同尺度上的数据分布之间找到一个合适的权衡。我们发现,作为人类大脑数据集一部分的两个对象之间在数据如何在不同尺度上分布方面存在差异。因此,通过努力实现聚类的准确性和聚类之间的定位,g-SNE 方法可以识别数据跨尺度组织的新方面。

相似文献

1
Using Global t-SNE to Preserve Intercluster Data Structure.使用全局 t-SNE 保持簇间数据结构。
Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.
4
Self-Organizing Nebulous Growths for Robust and Incremental Data Visualization.用于稳健且增量式数据可视化的自组织星云状生长
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4588-4602. doi: 10.1109/TNNLS.2020.3023941. Epub 2021 Oct 5.

引用本文的文献

本文引用的文献

6
Hyperbolic geometry of the olfactory space.嗅觉空间的双曲几何。
Sci Adv. 2018 Aug 29;4(8):eaaq1458. doi: 10.1126/sciadv.aaq1458. eCollection 2018 Aug.
10
Clique topology reveals intrinsic geometric structure in neural correlations.团拓扑揭示了神经相关性中的内在几何结构。
Proc Natl Acad Sci U S A. 2015 Nov 3;112(44):13455-60. doi: 10.1073/pnas.1506407112. Epub 2015 Oct 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验