Alcaide Daniel, Aerts Jan
Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
imec, KU Leuven, Leuven, Belgium.
PeerJ Comput Sci. 2018 Jan 29;4:e145. doi: 10.7717/peerj-cs.145. eCollection 2018.
Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean.
在数据集里寻找有用的模式已经在视觉分析领域引起了相当大的关注。最常见的任务之一是聚类的识别与表示。然而,在异构数据集中这并非易事,因为需要从不同角度分析数据。实际上,高度可变的模式可能会掩盖数据集中的潜在趋势。树形图是凝聚层次聚类产生的图形表示,为查看不同详细程度的聚类提供了一个框架。然而,当数据集变大时,树形图会变得杂乱,并且在异构数据集中,用树形图的单次切割来划分不同聚类可能并不充分。在这项工作中,我们提出了一种名为MCLEAN的视觉分析方法,它提供了一种通用方法来指导用户探索和检测聚类。借助基于关系数据的图形变换,它通过改变空间化支持异构数据集表示的可扩展环境。我们由此将聚类数据集的多级表示与社区发现算法相结合。我们的方法需要向用户展示启发式算法的结果,提供一个开始探索和数据分析的设置。为了评估我们提出的方法,我们进行了一项定性用户研究,要求参与者探索一个异构数据集,将MCLEAN获得的结果与树形图进行比较。这些定性结果表明,MCLEAN是帮助用户在异构数据集中检测聚类的有效方法。所提出的方法在一个R包中实现,可在https://bitbucket.org/vda-lab/mclean获取。