IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9774-9786. doi: 10.1109/TPAMI.2021.3129793. Epub 2022 Nov 7.
We propose a novel unified frameork for automated distributed active learning (AutoDAL) to address multiple challenging problems in active learning such as limited labeled data, imbalanced datasets, automatic hyperparameter selection as well as scalability to big data. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes and jointly optimizing hyperparameters in both the classification and query selection stages. For dense datasets, clustering-based uncertainty sampling with maximum entropy (CME) loss is applied in the optimization. For sparse and imbalanced datasets, shrinkage optimized KL-divergence regularization and local selection based active learning (SOAR) loss are further naturally adapted in AutoDAL. The optimization is efficiently resolved by iteratively executing a genetic algorithm (GA) refined with a local generating set search (GSS) and solving an integer linear programming (ILP) problem. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and two real-world datasets including an electrocardiogram (ECG) dataset and a credit fraud detection dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.
我们提出了一个新颖的统一框架,用于自动化分布式主动学习(AutoDAL),以解决主动学习中的多个挑战性问题,如有限的标记数据、不平衡数据集、自动超参数选择以及大数据的可扩展性。首先,通过聚合来自不同计算节点的建议成本函数,并在分类和查询选择阶段联合优化超参数,进行自动化基于图的半监督学习。对于密集数据集,在优化中应用基于聚类的不确定性抽样最大熵(CME)损失。对于稀疏和不平衡数据集,AutoDAL 中进一步自然适应了收缩优化 KL 散度正则化和基于局部选择的主动学习(SOAR)损失。通过迭代执行遗传算法(GA)并使用局部生成集搜索(GSS)进行精炼以及解决整数线性规划(ILP)问题,有效地解决了优化问题。此外,我们提出了一种高效的分布式主动学习算法,适用于大数据。将所提出的 AutoDAL 算法应用于多个基准数据集和两个真实世界数据集,包括心电图(ECG)数据集和信用欺诈检测数据集,用于分类。我们证明,与几种最先进的 AutoML 方法和主动学习算法相比,所提出的 AutoDAL 算法能够实现显著更好的性能。