IEEE Trans Vis Comput Graph. 2018 Jan;24(1):298-308. doi: 10.1109/TVCG.2017.2744818. Epub 2017 Aug 29.
Labeling data instances is an important task in machine learning and visual analytics. Both fields provide a broad set of labeling strategies, whereby machine learning (and in particular active learning) follows a rather model-centered approach and visual analytics employs rather user-centered approaches (visual-interactive labeling). Both approaches have individual strengths and weaknesses. In this work, we conduct an experiment with three parts to assess and compare the performance of these different labeling strategies. In our study, we (1) identify different visual labeling strategies for user-centered labeling, (2) investigate strengths and weaknesses of labeling strategies for different labeling tasks and task complexities, and (3) shed light on the effect of using different visual encodings to guide the visual-interactive labeling process. We further compare labeling of single versus multiple instances at a time, and quantify the impact on efficiency. We systematically compare the performance of visual interactive labeling with that of active learning. Our main findings are that visual-interactive labeling can outperform active learning, given the condition that dimension reduction separates well the class distributions. Moreover, using dimension reduction in combination with additional visual encodings that expose the internal state of the learning model turns out to improve the performance of visual-interactive labeling.
标注数据实例是机器学习和可视分析中的一项重要任务。这两个领域都提供了广泛的标注策略,其中机器学习(尤其是主动学习)采用了一种相当模型为中心的方法,而可视分析则采用了更以用户为中心的方法(可视交互标注)。这两种方法各有优缺点。在这项工作中,我们进行了一个由三部分组成的实验,以评估和比较这些不同标注策略的性能。在我们的研究中,我们(1)确定了用户为中心标注的不同视觉标注策略,(2)研究了不同标注任务和任务复杂度的标注策略的优缺点,以及(3)阐明了使用不同视觉编码来指导可视交互标注过程的效果。我们进一步比较了一次标注单个实例与多个实例的情况,并量化了对效率的影响。我们系统地比较了可视交互标注与主动学习的性能。我们的主要发现是,在维度降低能够很好地区分类分布的条件下,可视交互标注可以优于主动学习。此外,使用维度降低并结合额外的视觉编码来揭示学习模型的内部状态,事实证明可以提高可视交互标注的性能。