Madhawa Kaushalya, Murata Tsuyoshi
Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan.
Entropy (Basel). 2020 Oct 16;22(10):1164. doi: 10.3390/e22101164.
Current breakthroughs in the field of machine learning are fueled by the deployment of deep neural network models. Deep neural networks models are notorious for their dependence on large amounts of labeled data for training them. Active learning is being used as a solution to train classification models with less labeled instances by selecting only the most informative instances for labeling. This is especially important when the labeled data are scarce or the labeling process is expensive. In this paper, we study the application of active learning on attributed graphs. In this setting, the data instances are represented as nodes of an attributed graph. Graph neural networks achieve the current state-of-the-art classification performance on attributed graphs. The performance of graph neural networks relies on the careful tuning of their hyperparameters, usually performed using a validation set, an additional set of labeled instances. In label scarce problems, it is realistic to use all labeled instances for training the model. In this setting, we perform a fair comparison of the existing active learning algorithms proposed for graph neural networks as well as other data types such as images and text. With empirical results, we demonstrate that state-of-the-art active learning algorithms designed for other data types do not perform well on graph-structured data. We study the problem within the framework of the exploration-vs.-exploitation trade-off and propose a new count-based exploration term. With empirical evidence on multiple benchmark graphs, we highlight the importance of complementing uncertainty-based active learning models with an exploration term.
机器学习领域当前的突破得益于深度神经网络模型的部署。深度神经网络模型因依赖大量带标签数据进行训练而声名狼藉。主动学习正被用作一种解决方案,通过仅选择最具信息性的实例进行标注,以用较少的带标签实例训练分类模型。当带标签数据稀缺或标注过程成本高昂时,这一点尤为重要。在本文中,我们研究主动学习在属性图上的应用。在这种情况下,数据实例被表示为属性图的节点。图神经网络在属性图上实现了当前最先进的分类性能。图神经网络的性能依赖于对其超参数的精心调整,通常使用验证集(另一组带标签实例)来进行。在标签稀缺的问题中,将所有带标签实例用于训练模型是现实可行的。在这种情况下,我们对针对图神经网络以及其他数据类型(如图像和文本)提出的现有主动学习算法进行了公平比较。通过实证结果,我们证明为其他数据类型设计的最先进主动学习算法在图结构数据上表现不佳。我们在探索与利用权衡的框架内研究该问题,并提出了一个基于计数的新探索项。通过在多个基准图上的实证证据,我们强调了用一个探索项补充基于不确定性的主动学习模型的重要性。