Xiao Yi, Yuan Qiangqiang, Jiang Kui, He Jiang, Lin Chia-Wen, Zhang Liangpei
IEEE Trans Image Process. 2024;33:738-752. doi: 10.1109/TIP.2023.3349004. Epub 2024 Jan 12.
Transformer-based method has demonstrated promising performance in image super-resolution tasks, due to its long-range and global aggregation capability. However, the existing Transformer brings two critical challenges for applying it in large-area earth observation scenes: (1) redundant token representation due to most irrelevant tokens; (2) single-scale representation which ignores scale correlation modeling of similar ground observation targets. To this end, this paper proposes to adaptively eliminate the interference of irreverent tokens for a more compact self-attention calculation. Specifically, we devise a Residual Token Selective Group (RTSG) to grasp the most crucial token by dynamically selecting the top- k keys in terms of score ranking for each query. For better feature aggregation, a Multi-scale Feed-forward Layer (MFL) is developed to generate an enriched representation of multi-scale feature mixtures during feed-forward process. Moreover, we also proposed a Global Context Attention (GCA) to fully explore the most informative components, thus introducing more inductive bias to the RTSG for an accurate reconstruction. In particular, multiple cascaded RTSGs form our final Top- k Token Selective Transformer (TTST) to achieve progressive representation. Extensive experiments on simulated and real-world remote sensing datasets demonstrate our TTST could perform favorably against state-of-the-art CNN-based and Transformer-based methods, both qualitatively and quantitatively. In brief, TTST outperforms the state-of-the-art approach (HAT-L) in terms of PSNR by 0.14 dB on average, but only accounts for 47.26% and 46.97% of its computational cost and parameters. The code and pre-trained TTST will be available at https://github.com/XY-boy/TTST for validation.
基于Transformer的方法由于其长距离和全局聚合能力,在图像超分辨率任务中展现出了良好的性能。然而,现有的Transformer在将其应用于大面积地球观测场景时带来了两个关键挑战:(1)由于大多数无关令牌导致的冗余令牌表示;(2)忽略了相似地面观测目标的尺度相关性建模的单尺度表示。为此,本文提出自适应消除无关令牌的干扰,以进行更紧凑的自注意力计算。具体而言,我们设计了一个残差令牌选择组(RTSG),通过根据每个查询的分数排名动态选择前k个键来抓住最关键的令牌。为了更好地进行特征聚合,开发了一个多尺度前馈层(MFL),以在前馈过程中生成多尺度特征混合的丰富表示。此外,我们还提出了一种全局上下文注意力(GCA),以充分探索最具信息性的组件,从而为RTSG引入更多归纳偏差以进行准确重建。特别是,多个级联的RTSG形成了我们最终的Top-k令牌选择Transformer(TTST)以实现渐进表示。在模拟和真实世界遥感数据集上的大量实验表明,我们的TTST在定性和定量方面都能优于基于CNN和基于Transformer的现有方法。简而言之,TTST在PSNR方面平均比最先进的方法(HAT-L)高出0.14 dB,但计算成本和参数仅占其47.26%和46.97%。代码和预训练的TTST将在https://github.com/XY-boy/TTST上提供以供验证。