IEEE J Biomed Health Inform. 2019 Nov;23(6):2238-2244. doi: 10.1109/JBHI.2018.2881155. Epub 2018 Nov 13.
Nowadays, there is an abundance of biomedical data, such as images and genetic sequences, among others. However, there is a lack of annotation to such volume of data, due to the high costs involved to perform this task. Thus, it is mandatory to develop techniques to ease the burden of human annotation. To reach such goal active learning strategies can be applied. However, the state-of-the-art active learning methods, generally, are not feasible to lead with real-world datasets. Another important issue, that is generally neglected by these methods, is related to the conception that the classifier tends to learn more and more at each iteration. Their adopted selection criteria do not properly exploit the knowledge of the classifier. Therefore, in this paper, we propose the use of an active learning approach, in order to leverage the learning process, including the proposal of a novel active learning strategy. The main difference of our proposed strategy is related to the participation of the classifier in an extremely active way in its learning process. So, we can better maximize and prioritize the knowledge that is obtained by the classifier at each iteration, making use of this knowledge in a more appropriate and useful way when selecting more informative samples. To do so, in our selection criteria, we give significant importance to the classifications suggested by the classifier. In addition, jointly with the participation and the knowledge of the classifier, we consider both uncertainty and representativeness criteria through a fine-grained analysis of the samples. Experimental results show that our novel active learning approach outperforms state-of-the-art active learning methods, considering several supervised classifiers. Hence, dealing with real dataset problems in a better way, equalizing the tradeoff between annotation task and higher accuracy rates.
如今,存在着大量的生物医学数据,例如图像和基因序列等等。然而,由于执行这项任务的成本很高,这些数据缺乏标注。因此,必须开发技术来减轻人工标注的负担。为了实现这一目标,可以应用主动学习策略。然而,最先进的主动学习方法通常不适用于真实世界的数据集。另一个重要的问题是,这些方法通常忽略了分类器在每次迭代中都会越来越多地学习的概念。它们采用的选择标准并没有很好地利用分类器的知识。因此,在本文中,我们提出了使用主动学习方法来利用学习过程,包括提出一种新的主动学习策略。我们提出的策略的主要区别在于分类器以极其积极的方式参与其学习过程。因此,我们可以更好地最大化和优先考虑分类器在每次迭代中获得的知识,在选择更具信息量的样本时,以更适当和有用的方式利用这些知识。为此,在我们的选择标准中,我们非常重视分类器提出的分类。此外,我们通过对样本进行细粒度分析,联合分类器的参与和知识,考虑不确定性和代表性标准。实验结果表明,我们的新主动学习方法在考虑了几个监督分类器后,优于最先进的主动学习方法。因此,以更好的方式处理真实数据集问题,在标注任务和更高的准确率之间实现平衡。