Hu Weiming, Hu Wei, Xie Nianhua, Maybank Steve
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1147-61. doi: 10.1109/TSMCB.2009.2013197. Epub 2009 Mar 24.
Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.
大多数现有的主动学习方法都是有监督的。有监督主动学习存在以下问题:在处理特征空间中样本分布与其标签之间的语义鸿沟时效率低下,缺乏选择属于训练样本中尚未出现的新类别的新样本的能力,以及对样本类别语义解释变化的适应性不足。为了解决这些问题,我们提出了一种基于层次图论聚类的无监督主动学习框架。在该框架中,两种有前景的图论聚类算法,即支配集聚类和谱聚类,以层次方式相结合。我们的框架具有一些优点,如易于实现、架构灵活以及对标签变化的适应性。对网络入侵检测、图像分类和视频分类数据集的评估表明,我们的主动学习框架可以在保持自动分类高精度的同时有效减少人工分类的工作量。结果表明,总体而言,我们的框架优于基于支持向量机的有监督主动学习,特别是在更高效地处理训练样本中尚未出现类别的新样本方面。