Liu Xiao-Qian, Zhang Peng-Fei, Luo Xin, Huang Zi, Xu Xin-Shun
IEEE Trans Image Process. 2024;33:6550-6563. doi: 10.1109/TIP.2024.3492705. Epub 2024 Nov 19.
Unsupervised Domain Adaptation (UDA) has shown promise in Scene Text Recognition (STR) by facilitating knowledge transfer from labeled synthetic text (source) to more challenging unlabeled real scene text (target). However, existing UDA-based STR methods fully rely on the pseudo-labels of target samples, which ignores the impact of domain gaps (inter-domain noise) and various natural environments (intra-domain noise), resulting in poor pseudo-label quality. In this paper, we propose a novel noisy-aware unsupervised domain adaptation framework tailored for STR, which aims to enhance model robustness against both inter- and intra-domain noise, thereby providing more precise pseudo-labels for target samples. Concretely, we propose a reweighting target pseudo-labels by estimating the entropy of refined probability distributions, which mitigates the impact of domain gaps on pseudo-labels. Additionally, a decoupled triple-P-N consistency matching module is proposed, which leverages data augmentation to increase data diversity, enhancing model robustness in diverse natural environments. Within this module, we design a low-confidence-based character negative learning, which is decoupled from high-confidence-based positive learning, thus improving sample utilization under scarce target samples. Furthermore, we extend our framework to the more challenging Source-Free UDA (SFUDA) setting, where only a pre-trained source model is available for adaptation, with no access to source data. Experimental results on benchmark datasets demonstrate the effectiveness of our framework. Under the SFUDA setting, our method exhibits faster convergence and superior performance with less training data than previous UDA-based STR methods. Our method surpasses representative STR methods, establishing new state-of-the-art results across multiple datasets.
无监督域适应(UDA)通过促进从有标签的合成文本(源)到更具挑战性的无标签真实场景文本(目标)的知识转移,在场景文本识别(STR)中显示出了潜力。然而,现有的基于UDA的STR方法完全依赖于目标样本的伪标签,这忽略了域差距(域间噪声)和各种自然环境(域内噪声)的影响,导致伪标签质量较差。在本文中,我们提出了一种专门为STR量身定制的新型噪声感知无监督域适应框架,旨在增强模型对域间和域内噪声的鲁棒性,从而为目标样本提供更精确的伪标签。具体而言,我们提出通过估计细化概率分布的熵来重新加权目标伪标签,这减轻了域差距对伪标签的影响。此外,还提出了一个解耦的三对三一致性匹配模块,该模块利用数据增强来增加数据多样性,增强模型在各种自然环境中的鲁棒性。在这个模块中,我们设计了一种基于低置信度的字符负学习,它与基于高置信度的正学习解耦,从而在稀缺的目标样本下提高样本利用率。此外,我们将我们的框架扩展到更具挑战性的无源UDA(SFUDA)设置,在这种设置下,只有一个预训练的源模型可用于适应,无法访问源数据。在基准数据集上的实验结果证明了我们框架的有效性。在SFUDA设置下,我们的方法比以前基于UDA的STR方法在更少的训练数据下表现出更快的收敛速度和卓越的性能。我们的方法超越了代表性的STR方法,在多个数据集上建立了新的最先进的结果。