Luo Zhipeng, Hauskrecht Milos
Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania.
Proc ACM Int Conf Inf Knowl Manag. 2020 Oct;2020:1045-1054. doi: 10.1145/3340531.3412022.
Learning of classification models from real-world data often requires substantial human effort devoted to annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - -based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a of data instances; the region's label is a human assessment of the class of the data subpopulation. By using algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a solution that aims at incrementally building a hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.
从现实世界数据中学习分类模型通常需要投入大量人力进行标注。由于这个过程可能非常耗时且成本高昂,因此找到有效的方法来降低标注成本对于构建此类模型至关重要。为了解决这个问题,我们探索了一种新型的基于人类反馈的反馈。简而言之,一个区域被定义为输入数据空间的超立方子空间,并表示一组数据实例;该区域的标签是人类对数据子群体类别的评估。通过使用算法,可以从此类带标签的区域中学习基于实例的分类器。一般来说,关键挑战在于在给定的数据空间中可以定义和查询无限多个区域。为了最小化基于区域的查询数量和复杂性,我们提出并开发了一种解决方案,旨在逐步构建区域的层次结构。此外,为了避免构建可能与类别无关的区域层次结构,我们进一步建议并行增长多个不同的层次结构,并扩展那些信息更丰富的层次结构。通过在众多数据集上进行实验,我们证明了使用基于区域反馈的方法可以从非常少且简单的查询中学习到非常好的分类器,因此在减少构建分类模型所需的人工标注工作量方面非常有效。