Chatzimparmpas Angelos, Martins Rafael M, Kerren Andreas
IEEE Trans Vis Comput Graph. 2020 Aug;26(8):2696-2714. doi: 10.1109/TVCG.2020.2986996. Epub 2020 Apr 13.
t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this article, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
用于多维数据可视化的t分布随机邻域嵌入(t-SNE)已被证明是一种流行的方法,在广泛的领域中都有成功的应用。尽管t-SNE很有用,但它的投影可能难以解释甚至具有误导性,这损害了结果的可信度。了解t-SNE本身的细节及其输出中特定模式背后的原因可能是一项艰巨的任务,尤其是对于降维领域的非专家而言。在本文中,我们介绍了t-viSNE,这是一种用于t-SNE投影可视化探索的交互式工具,使分析人员能够检查其准确性和含义的不同方面,例如超参数的影响、距离和邻域保留、特定邻域的密度和成本,以及维度与视觉模式之间的相关性。我们为t-SNE投影的可视化提出了一个连贯、易懂且集成良好的不同视图集合。通过使用真实数据集的假设使用场景展示了t-viSNE的适用性和可用性。最后,我们展示了一项用户研究的结果,其中对该工具的有效性进行了评估。通过揭示通常在运行t-SNE后会丢失的信息,我们希望支持分析人员使用t-SNE并使其结果更易于理解。