Inception Institute of AI, Abu Dhabi, United Arab Emirates; Faculty of IT, Monash University, Melbourne, Australia.
School of Computing Technologies, RMIT University, Melbourne, Australia.
Med Image Anal. 2024 Oct;97:103261. doi: 10.1016/j.media.2024.103261. Epub 2024 Jul 4.
State-of-the-art deep learning models often fail to generalize in the presence of distribution shifts between training (source) data and test (target) data. Domain adaptation methods are designed to address this issue using labeled samples (supervised domain adaptation) or unlabeled samples (unsupervised domain adaptation). Active learning is a method to select informative samples to obtain maximum performance from minimum annotations. Selecting informative target domain samples can improve model performance and robustness, and reduce data demands. This paper proposes a novel pipeline called ALFREDO (Active Learning with FeatuRe disEntangelement and DOmain adaptation) that performs active learning under domain shift. We propose a novel feature disentanglement approach to decompose image features into domain specific and task specific components. Domain specific components refer to those features that provide source specific information, e.g., scanners, vendors or hospitals. Task specific components are discriminative features for classification, segmentation or other tasks. Thereafter we define multiple novel cost functions that identify informative samples under domain shift. We test our proposed method for medical image classification using one histopathology dataset and two chest X-ray datasets. Experiments show our method achieves state-of-the-art results compared to other domain adaptation methods, as well as state of the art active domain adaptation methods.
最先进的深度学习模型在训练(源)数据和测试(目标)数据之间存在分布偏移的情况下,往往无法泛化。域自适应方法旨在使用带标签的样本(有监督域自适应)或未标记的样本(无监督域自适应)来解决这个问题。主动学习是一种选择信息丰富的样本的方法,以便从最少的注释中获得最大的性能。选择信息丰富的目标域样本可以提高模型的性能和鲁棒性,并减少数据需求。本文提出了一种名为 ALFREDO(主动学习与特征解缠和域自适应)的新管道,它可以在域转移下进行主动学习。我们提出了一种新的特征解缠方法,将图像特征分解为特定于域和特定于任务的组件。特定于域的组件是指提供源特定信息的那些特征,例如扫描仪、供应商或医院。特定于任务的组件是用于分类、分割或其他任务的有区别的特征。此后,我们定义了多个新的成本函数,以在域转移下识别信息丰富的样本。我们使用一个组织病理学数据集和两个胸部 X 射线数据集来测试我们提出的用于医学图像分类的方法。实验表明,与其他域自适应方法以及最新的主动域自适应方法相比,我们的方法取得了最先进的结果。