Xiang Zeyu
College of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China.
Sci Rep. 2025 Mar 18;15(1):9306. doi: 10.1038/s41598-025-93143-w.
Infrared and visible image fusion (vis-ir) enhances diagnostic accuracy in medical imaging and biological analysis. Existing CNN-based and Transformer-based methods face computational inefficiencies in modeling global dependencies. The author proposes VSS-SpatioNet, a lightweight architecture that replaces self-attention in Transformers with a Visual State Space (VSS) module for efficient dependency modeling. The framework employs an asymmetric encoder-decoder with a multi-scale autoencoder and a novel VSS-Spatial (VS) fusion block for local-global feature integration. Evaluations on TNO, Harvard Medical, and RoadScene datasets demonstrate superior performance. On TNO, VSS-SpatioNet achieves state-of-the-art Entropy (En = 7.0058) and Mutual Information (MI = 14.0116), outperforming 12 benchmark methods. For RoadScene, it attains gradient-based fusion performance ([Formula: see text]=0.5712), Piella's metric ([Formula: see text]=0.7926), and average gradient (AG = 5.2994), surpassing prior works. On Harvard Medical, the VS strategy improves Mean Gradient by 18.7% (0.0224 vs. 0.0198) against FusionGAN, validating enhanced feature preservation. Results confirm the framework's efficacy in medical applications, particularly precise tissue characterization.
红外与可见光图像融合(vis-ir)可提高医学成像和生物分析中的诊断准确性。现有的基于卷积神经网络(CNN)和基于Transformer的方法在对全局依赖性进行建模时面临计算效率低下的问题。作者提出了VSS-SpatioNet,这是一种轻量级架构,它用视觉状态空间(VSS)模块取代了Transformer中的自注意力机制,以实现高效的依赖性建模。该框架采用了具有多尺度自动编码器的非对称编码器-解码器以及用于局部-全局特征整合的新型VSS-空间(VS)融合块。在TNO、哈佛医学和道路场景数据集上的评估显示出卓越的性能。在TNO数据集上,VSS-SpatioNet实现了最优的熵(En = 7.0058)和互信息(MI = 14.0116),优于12种基准方法。对于道路场景数据集,它获得了基于梯度的融合性能([公式:见原文]=0.5712)、皮埃拉度量([公式:见原文]=0.7926)和平均梯度(AG = 5.2994),超过了先前的工作。在哈佛医学数据集上,VS策略相对于FusionGAN将平均梯度提高了18.7%(0.0224对0.0198),验证了增强的特征保留能力。结果证实了该框架在医学应用中的有效性,特别是在精确的组织特征描述方面。