IEEE Trans Image Process. 2021;30:9280-9293. doi: 10.1109/TIP.2021.3124317. Epub 2021 Nov 12.
Most existing unsupervised active learning methods aim at minimizing the data reconstruction loss by using the linear models to choose representative samples for manually labeling in an unsupervised setting. Thus these methods often fail in modelling data with complex non-linear structure. To address this issue, we propose a new deep unsupervised Active Learning method for classification tasks, inspired by the idea of Matrix Sketching, called ALMS. Specifically, ALMS leverages a deep auto-encoder to embed data into a latent space, and then describes all the embedded data with a small size sketch to summarize the major characteristics of the data. In contrast to previous approaches that reconstruct the whole data matrix for selecting the representative samples, ALMS aims to select a representative subset of samples to well approximate the sketch, which can preserve the major information of data meanwhile significantly reducing the number of network parameters. This makes our algorithm alleviate the issue of model overfitting and readily cope with large datasets. Actually, the sketch provides a type of self-supervised signal to guide the learning of the model. Moreover, we propose to construct an auxiliary self-supervised task by classifying real/fake samples, in order to further improve the representation ability of the encoder. We thoroughly evaluate the performance of ALMS on both single-label and multi-label classification tasks, and the results demonstrate its superior performance against the state-of-the-art methods. The code can be found at https://github.com/lrq99/ALMS.
大多数现有的无监督主动学习方法旨在通过使用线性模型在无监督环境中选择具有代表性的样本进行手动标记,从而最小化数据重建损失。因此,这些方法在建模具有复杂非线性结构的数据时往往会失败。为了解决这个问题,我们提出了一种新的基于矩阵素描的深度无监督主动学习分类方法,称为 ALMS。具体来说,ALMS 利用深度自动编码器将数据嵌入到潜在空间中,然后使用小尺寸草图来描述所有嵌入的数据,以总结数据的主要特征。与之前为选择代表性样本而重建整个数据矩阵的方法不同,ALMS 旨在选择代表样本的子集,以很好地近似草图,这可以保留数据的主要信息,同时大大减少网络参数的数量。这使得我们的算法缓解了模型过拟合的问题,并能够轻松处理大型数据集。实际上,草图提供了一种自我监督信号来指导模型的学习。此外,我们提出通过分类真实/假样本来构建辅助自我监督任务,以进一步提高编码器的表示能力。我们在单标签和多标签分类任务上对 ALMS 的性能进行了彻底评估,结果表明它优于最先进的方法。代码可以在 https://github.com/lrq99/ALMS 上找到。