Zhang Xiaomin
College of Internet of Things and Artificial Intelligence, Fujian Polytechnic of Information Technology, Fuzhou, 350003, Fujian, China.
Sci Rep. 2024 Apr 24;14(1):9435. doi: 10.1038/s41598-024-59384-x.
Recently, convolutional neural networks (CNNs) and Transformer-based Networks have exhibited remarkable prowess in the realm of remote sensing image super-resolution (RSISR), delivering promising results in the field. Nevertheless, the effective fusion of the inductive bias inherent in CNNs and the long-range modeling capabilities encapsulated within the Transformer architecture remains a relatively uncharted terrain in the context of RSISR endeavors. Accordingly, we propose an uncertainty-driven mixture convolution and transformer network (UMCTN) to earn a performance promotion. Specifically, to acquire multi-scale and hierarchical features, UMCTN adopts a U-shape architecture. Utilizing the dual-view aggregation block (DAB) based residual dual-view aggregation group (RDAG) in both encoder and decoder, we solely introduce a pioneering dense-sparse transformer group (DSTG) into the latent layer. This design effectively eradicates the considerable quadratic complexity inherent in vanilla Transformer structures. Moreover, we introduce a novel uncertainty-driven Loss (UDL) to steer the network's attention towards pixels exhibiting significant variance. The primary objective is to elevate the reconstruction quality specifically in texture and edge regions. Experimental outcomes on the UCMerced LandUse and AID datasets unequivocally affirm that UMCTN achieves state-of-the-art performance in comparison to presently prevailing methodologies.
最近,卷积神经网络(CNN)和基于Transformer的网络在遥感图像超分辨率(RSISR)领域展现出了卓越的能力,在该领域取得了令人瞩目的成果。然而,在RSISR的背景下,将CNN固有的归纳偏差与Transformer架构所具备的远程建模能力进行有效融合,仍然是一个相对未知的领域。因此,我们提出了一种不确定性驱动的混合卷积与Transformer网络(UMCTN)来提升性能。具体而言,为了获取多尺度和分层特征,UMCTN采用了U形架构。通过在编码器和解码器中使用基于双视图聚合块(DAB)的残差双视图聚合组(RDAG),我们仅在潜在层引入了一个开创性的密集-稀疏Transformer组(DSTG)。这种设计有效地消除了普通Transformer结构中固有的相当大的二次复杂性。此外,我们引入了一种新颖的不确定性驱动损失(UDL),以引导网络关注具有显著方差的像素。主要目标是特别提高纹理和边缘区域的重建质量。在UCMerced LandUse和AID数据集上的实验结果明确证实,与目前流行的方法相比,UMCTN实现了领先的性能。