Xue Yanbing, Hauskrecht Milos
Department of Computer Science, University of Pittsburgh, Pittsburgh, PA.
Proc Int Fla AI Res Soc Conf. 2018 May;2018:158-163.
Our ability to learn accurate classification models from data is often limited by the number of available data instances. This limitation is of particular concern when data instances need to be labeled by humans and when the labeling process carries a significant cost. Recent years witnessed increased research interest in developing methods capable of learning models from a smaller number of examples. One such direction is active learning. Another, more recent direction showing a great promise utilizes auxiliary probabilistic information in addition to class labels. However, this direction has been applied and tested only in binary classification settings. In this work we first develop a multi-class variant of the auxiliary probabilistic approach, and after that embed it within an active learning framework, effectively combining two strategies for reducing the dependency of multi-class classification learning on the number of labeled examples. We demonstrate the effectiveness of our new approach on both simulated and real-world datasets.
我们从数据中学习准确分类模型的能力常常受到可用数据实例数量的限制。当数据实例需要由人工标注且标注过程成本高昂时,这种限制尤其令人担忧。近年来,人们对开发能够从较少示例中学习模型的方法的研究兴趣日益浓厚。其中一个方向是主动学习。另一个更新的、前景广阔的方向是除了类别标签之外,还利用辅助概率信息。然而,这个方向仅在二分类设置中得到了应用和测试。在这项工作中,我们首先开发了辅助概率方法的多分类变体,然后将其嵌入到主动学习框架中,有效地结合了两种策略,以减少多分类学习对标注示例数量的依赖。我们在模拟数据集和真实世界数据集上都证明了我们新方法的有效性。