Suppr超能文献

条件t-SNE:更具信息性的t-SNE嵌入

Conditional t-SNE: more informative t-SNE embeddings.

作者信息

Kang Bo, García García Darío, Lijffijt Jefrey, Santos-Rodríguez Raúl, De Bie Tijl

机构信息

Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium.

Facebook AI, New York, USA.

出版信息

Mach Learn. 2021;110(10):2905-2940. doi: 10.1007/s10994-020-05917-0. Epub 2020 Dec 6.

Abstract

Dimensionality reduction and manifold learning methods such as t-distributed stochastic neighbor embedding (t-SNE) are frequently used to map high-dimensional data into a two-dimensional space to visualize and explore that data. Going beyond the specifics of t-SNE, there are two substantial limitations of any such approach: (1) not all information can be captured in a single two-dimensional embedding, and (2) to well-informed users, the salient structure of such an embedding is often already known, preventing that any real new insights can be obtained. Currently, it is not known how to extract the remaining information in a similarly effective manner. We introduce (ct-SNE), a generalization of t-SNE that discounts prior information in the form of labels. This enables obtaining more informative and more relevant embeddings. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining an elegant method with a single integrated objective. We show how to efficiently optimize the objective and study the effects of the extra parameter that ct-SNE has over t-SNE. Qualitative and quantitative empirical results on synthetic and real data show ct-SNE is scalable, effective, and achieves its goal: it allows complementary structure to be captured in the embedding and provided new insights into real data.

摘要

降维和流形学习方法,如t分布随机邻域嵌入(t-SNE),经常被用于将高维数据映射到二维空间,以可视化和探索这些数据。除了t-SNE的具体细节之外,任何此类方法都存在两个重大局限性:(1)并非所有信息都能在单个二维嵌入中被捕获,(2)对于见多识广的用户来说,这种嵌入的显著结构往往已经为人所知,这使得无法获得任何真正新的见解。目前,尚不清楚如何以类似有效的方式提取剩余信息。我们引入了条件t-SNE(ct-SNE),它是t-SNE的一种推广,以标签的形式对先验信息进行了折扣。这使得能够获得更具信息性和相关性的嵌入。为了实现这一点,我们提出了t-SNE目标的条件版本,得到了一种具有单一集成目标的优雅方法。我们展示了如何有效地优化该目标,并研究了ct-SNE相对于t-SNE的额外参数的影响。在合成数据和真实数据上进行的定性和定量实证结果表明,ct-SNE具有可扩展性、有效性,并实现了其目标:它允许在嵌入中捕获互补结构,并为真实数据提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b266/8599264/b1f5c196eabe/10994_2020_5917_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验