Xu Huiwen, Lee Jaeri, Kang U
Data Mining Lab, Seoul National University, Seoul, Republic of Korea.
PLoS One. 2025 May 12;20(5):e0321987. doi: 10.1371/journal.pone.0321987. eCollection 2025.
How can we perform unsupervised domain adaptation when transferring a black-box source model to a target domain? Black-box Unsupervised Domain Adaptation focuses on transferring the labels derived from a pre-trained black-box source model to an unlabeled target domain. The problem setting is motivated by privacy concerns associated with accessing and utilizing source data or source model parameters. Recent studies typically train the target model by mimicking the labels derived from the black-box source model, which often contain noise due to domain gaps between the source and the target. Directly exploiting such noisy labels or disregarding them may lead to a decrease in the model's performance. We propose Threshold-Based Exploitation of Noisy Predictions (TEN), a method to accurately learn the target model with noisy labels in Black-box Unsupervised Domain Adaptation. To ensure the preservation of information from the black-box source model, we employ a threshold-based approach to distinguish between clean labels and noisy labels, thereby allowing the transfer of high-confidence knowledge from both labels. We utilize a flexible thresholding approach to adjust the threshold for each class, thereby obtaining an adequate amount of clean data for hard-to-learn classes. We also exploit knowledge distillation for clean data and negative learning for noisy labels to extract high-confidence information. Extensive experiments show that TEN outperforms baselines with an accuracy improvement of up to 9.49%.
在将黑盒源模型转移到目标域时,我们如何进行无监督域适应?黑盒无监督域适应专注于将从预训练黑盒源模型导出的标签转移到无标签的目标域。该问题设置的动机源于与访问和使用源数据或源模型参数相关的隐私问题。最近的研究通常通过模仿从黑盒源模型导出的标签来训练目标模型,由于源域和目标域之间的域差距,这些标签往往包含噪声。直接利用此类噪声标签或忽略它们可能会导致模型性能下降。我们提出了基于阈值的噪声预测利用方法(TEN),这是一种在黑盒无监督域适应中利用噪声标签准确学习目标模型的方法。为了确保保留黑盒源模型的信息,我们采用基于阈值的方法来区分干净标签和噪声标签,从而允许从这两种标签中转移高置信度知识。我们使用灵活的阈值方法为每个类别调整阈值,从而为难以学习的类别获得足够数量的干净数据。我们还利用干净数据的知识蒸馏和噪声标签的负学习来提取高置信度信息。大量实验表明,TEN的性能优于基线,准确率提高了9.49%。