Yu Hualong, Yang Xibei, Zheng Shang, Sun Changyin
IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):1088-1103. doi: 10.1109/TNNLS.2018.2855446. Epub 2018 Aug 21.
It is well known that active learning can simultaneously improve the quality of the classification model and decrease the complexity of training instances. However, several previous studies have indicated that the performance of active learning is easily disrupted by an imbalanced data distribution. Some existing imbalanced active learning approaches also suffer from either low performance or high time consumption. To address these problems, this paper describes an efficient solution based on the extreme learning machine (ELM) classification model, called active online-weighted ELM (AOW-ELM). The main contributions of this paper include: 1) the reasons why active learning can be disrupted by an imbalanced instance distribution and its influencing factors are discussed in detail; 2) the hierarchical clustering technique is adopted to select initially labeled instances in order to avoid the missed cluster effect and cold start phenomenon as much as possible; 3) the weighted ELM (WELM) is selected as the base classifier to guarantee the impartiality of instance selection in the procedure of active learning, and an efficient online updated mode of WELM is deduced in theory; and 4) an early stopping criterion that is similar to but more flexible than the margin exhaustion criterion is presented. The experimental results on 32 binary-class data sets with different imbalance ratios demonstrate that the proposed AOW-ELM algorithm is more effective and efficient than several state-of-the-art active learning algorithms that are specifically designed for the class imbalance scenario.
众所周知,主动学习可以同时提高分类模型的质量并降低训练实例的复杂度。然而,先前的一些研究表明,主动学习的性能很容易受到不平衡数据分布的干扰。一些现有的不平衡主动学习方法还存在性能低或时间消耗高的问题。为了解决这些问题,本文描述了一种基于极限学习机(ELM)分类模型的有效解决方案,称为主动在线加权ELM(AOW-ELM)。本文的主要贡献包括:1)详细讨论了主动学习为何会受到不平衡实例分布干扰及其影响因素;2)采用层次聚类技术来选择初始标记实例,以尽可能避免错过聚类效应和冷启动现象;3)选择加权ELM(WELM)作为基础分类器,以保证主动学习过程中实例选择的公正性,并从理论上推导了WELM的一种高效在线更新模式;4)提出了一种类似于但比余量耗尽准则更灵活的提前停止准则。在32个具有不同不平衡率的二分类数据集上的实验结果表明,所提出的AOW-ELM算法比几种专门为类不平衡场景设计的现有主动学习算法更有效、更高效。