IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6231-6246. doi: 10.1109/TPAMI.2022.3205748. Epub 2023 Apr 3.
Feature extractor plays a critical role in text recognition (TR), but customizing its architecture is relatively less explored due to expensive manual tweaking. In this article, inspired by the success of neural architecture search (NAS), we propose to search for suitable feature extractors. We design a domain-specific search space by exploring principles for having good feature extractors. The space includes a 3D-structured space for the spatial model and a transformed-based space for the sequential model. As the space is huge and complexly structured, no existing NAS algorithms can be applied. We propose a two-stage algorithm to effectively search in the space. In the first stage, we cut the space into several blocks and progressively train each block with the help of an auxiliary head. We introduce the latency constrain into the second stage and search sub-network from the trained supernet via natural gradient descent. In experiments, a series of ablation studies are performed to better understand the designed space, search algorithm, and searched architectures. We also compare the proposed method with various state-of-the-art ones on both hand-written and scene TR tasks. Extensive results show that our approach can achieve better recognition performance with less latency. Code is avaliable at https://github.com/AutoML-Research/TREFE.
特征提取器在文本识别 (TR) 中起着至关重要的作用,但由于手动调整代价高昂,因此对其架构进行定制的研究相对较少。在本文中,我们受神经架构搜索 (NAS) 成功的启发,提出了搜索合适的特征提取器的方法。我们通过探索构建良好特征提取器的原则,设计了一个特定于领域的搜索空间。该空间包括用于空间模型的 3D 结构空间和用于序列模型的基于变换的空间。由于空间庞大且结构复杂,没有现有的 NAS 算法可以应用。我们提出了一种两阶段算法,可有效地在该空间中进行搜索。在第一阶段,我们将空间分成几个块,并在辅助头的帮助下逐步训练每个块。我们将延迟约束引入第二阶段,并通过自然梯度下降从训练好的超网中搜索子网。在实验中,我们进行了一系列消融研究,以更好地理解所设计的空间、搜索算法和搜索到的架构。我们还将所提出的方法与手写和场景 TR 任务中的各种最先进的方法进行了比较。大量结果表明,我们的方法可以在延迟更低的情况下实现更好的识别性能。代码可在 https://github.com/AutoML-Research/TREFE 上获得。