IEEE Trans Pattern Anal Mach Intell. 2018 Aug;40(8):2023-2029. doi: 10.1109/TPAMI.2017.2743707. Epub 2017 Aug 24.
The task of labeling samples is demanding and expensive. Active learning aims to generate the smallest possible training data set that results in a classifier with high performance in the test phase. It usually consists of two steps of selecting a set of queries and requesting their labels. Among the suggested objectives to score the query sets, information theoretic measures have become very popular. Yet among them, those based on Fisher information (FI) have the advantage of considering the diversity among the queries and tractable computations. In this work, we provide a practical algorithm based on Fisher information ratio to obtain query distribution for a general framework where, in contrast to the previous FI-based querying methods, we make no assumptions over the test distribution. The empirical results on synthetic and real-world data sets indicate that this algorithm gives competitive results.
标注样本的任务既费力又昂贵。主动学习旨在生成尽可能小的训练数据集,以便在测试阶段得到性能高的分类器。它通常由选择一组查询和请求其标签的两个步骤组成。在建议的用于评分查询集的目标中,信息论测度变得非常流行。然而,在这些方法中,基于 Fisher 信息(FI)的方法具有考虑查询多样性和可计算性的优势。在这项工作中,我们提供了一种基于 Fisher 信息比的实用算法,用于获得一般框架中的查询分布,与以前基于 FI 的查询方法不同,我们对测试分布没有任何假设。在合成和真实数据集上的实验结果表明,该算法的结果具有竞争力。