Rauber Paulo E, Falcão Alexandre X, Telea Alexandru C
Department of Mathematics and Computing Science, University of Groningen, Groningen, The Netherlands.
University of Campinas, Campinas, Brazil.
Inf Vis. 2018 Oct;17(4):282-305. doi: 10.1177/1473871617713337. Epub 2017 Jun 27.
Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms.
降维是高维数据可视化的一种极具吸引力的替代方法。该方法通过将观测值(高维向量)之间的关系映射到低(二维或三维)维空间,从而深入了解高维特征空间。这些低维表示支持基于直接可视化的异常值和群组检测等任务。监督学习作为机器学习的一个子领域,也关注观测值。监督学习中的一个关键任务是根据以往经验的泛化为观测值分配类别标签。此类分类系统的有效开发取决于许多选择,包括特征描述符、学习算法和超参数。这些选择并非易事,而且没有简单的方法可以改进性能不佳的分类系统。在此背景下,我们首先提出使用基于降维(投影)的视觉表示来对分类效果进行预测反馈。其次,我们提出一种基于投影的视觉分析方法以及支持工具,可用于通过特征选择来改进分类系统。我们通过涉及四个数据集和三种代表性学习算法的实验来评估我们的提议。