Ma Decao, Su Juan, Li Shaopeng, Xian Yong
Xi'an Research Institute of High Technology, 710025, Xi'an, China.
Department of Automation, Tsinghua University, 100084, Beijing, China.
Sci Rep. 2024 Sep 27;14(1):22105. doi: 10.1038/s41598-024-73381-0.
Due to the high cost of equipment and the constraints of shooting conditions, obtaining aerial infrared images of specific targets is very challenging. Most methods using Generative Adversarial Networks for translating visible images to infrared greatly depend on registered data and struggle to handle the diversity and complexity of scenes in aerial infrared targets. This paper proposes a one side end-to-end unpaired aerial visible-to-infrared image translation algorithm, termed AerialIRGAN. AerialIRGAN introduces a dual-encoder structure, where one encoder is designed based on the Segment Anything Model to extract deep semantic features from visible images, and the other encoder is designed based on UniRepLKNet to capture small-scale patterns and sparse patterns from visible images. Subsequently, AerialIRGAN constructs a bridging module to deeply integrate the features of both encoders and their corresponding decoders. Finally, AerialIRGAN proposes a structural appearance consistency loss to guide the synthetic infrared images to maintain the structure of the source image while possessing distinct infrared characteristics. The experimental results show that compared to the existing typical infrared image generation algorithms, the proposed method can generate higher-quality infrared images and achieve better performance in both subjective visual description and objective metric evaluation.
由于设备成本高昂以及拍摄条件的限制,获取特定目标的航空红外图像极具挑战性。大多数使用生成对抗网络将可见光图像转换为红外图像的方法严重依赖于配准数据,并且难以处理航空红外目标场景的多样性和复杂性。本文提出了一种单侧端到端无配对的航空可见光到红外图像转换算法,称为AerialIRGAN。AerialIRGAN引入了一种双编码器结构,其中一个编码器基于“分割一切模型”(Segment Anything Model)设计,用于从可见光图像中提取深度语义特征,另一个编码器基于UniRepLKNet设计,用于从可见光图像中捕捉小规模模式和稀疏模式。随后,AerialIRGAN构建了一个桥接模块,以深度整合两个编码器及其相应解码器的特征。最后,AerialIRGAN提出了一种结构外观一致性损失,以引导合成红外图像在保持源图像结构的同时具备独特的红外特征。实验结果表明,与现有的典型红外图像生成算法相比,该方法能够生成更高质量的红外图像,并且在主观视觉描述和客观指标评估方面均取得了更好的性能。