Wu Lirong, Yuan Lifan, Zhao Guojiang, Lin Haitao, Li Stan Z
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8543-8554. doi: 10.1109/TNNLS.2022.3151498. Epub 2023 Oct 27.
High-dimensional data analysis for exploration and discovery includes two fundamental tasks: deep clustering and data visualization. When these two associated tasks are done separately, as is often the case thus far, disagreements can occur among the tasks in terms of geometry preservation. Namely, the clustering process is often accompanied by the corruption of the geometric structure, whereas visualization aims to preserve the data geometry for better interpretation. Therefore, how to achieve deep clustering and data visualization in an end-to-end unified framework is an important but challenging problem. In this article, we propose a novel neural network-based method, called deep clustering and visualization (DCV), to accomplish the two associated tasks end-to-end to resolve their disagreements. The DCV framework consists of two nonlinear dimensionality reduction (NLDR) transformations: 1) one from the input data space to latent feature space for clustering and 2) the other from the latent feature space to the final 2-D space for visualization. Importantly, the first NLDR transformation is mainly optimized by one Clustering Loss, allowing arbitrary corruption of the geometric structure for better clustering, while the second NLDR transformation is optimized by one Geometry-Preserving Loss to recover the corrupted geometry for better visualization. Extensive comparative results show that the DCV framework outperforms other leading clustering-visualization algorithms in terms of both quantitative evaluation metrics and qualitative visualization.
深度聚类和数据可视化。当这两个相关任务像迄今为止经常出现的情况那样分别完成时,在几何结构保留方面任务之间可能会出现分歧。也就是说,聚类过程通常伴随着几何结构的破坏,而可视化旨在保留数据几何结构以便更好地解释。因此,如何在端到端统一框架中实现深度聚类和数据可视化是一个重要但具有挑战性的问题。在本文中,我们提出了一种基于神经网络的新颖方法,称为深度聚类与可视化(DCV),以端到端地完成这两个相关任务来解决它们之间的分歧。DCV框架由两个非线性降维(NLDR)变换组成:1)一个是从输入数据空间到用于聚类的潜在特征空间,2)另一个是从潜在特征空间到用于可视化的最终二维空间。重要的是,第一个NLDR变换主要通过一个聚类损失进行优化,允许几何结构任意破坏以实现更好的聚类,而第二个NLDR变换通过一个几何结构保留损失进行优化以恢复被破坏的几何结构以实现更好的可视化。大量的对比结果表明DCV框架在定量评估指标和定性可视化方面均优于其他领先的聚类 - 可视化算法。