VSS-SpatioNet：一种用于多模态图像整合的多尺度特征融合网络。

VSS-SpatioNet: a multi-scale feature fusion network for multimodal image integrations.

作者信息

Xiang Zeyu

机构信息

College of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China.

出版信息

Sci Rep. 2025 Mar 18;15(1):9306. doi: 10.1038/s41598-025-93143-w.

DOI:10.1038/s41598-025-93143-w

PMID:40102490

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11920090/

Abstract

Infrared and visible image fusion (vis-ir) enhances diagnostic accuracy in medical imaging and biological analysis. Existing CNN-based and Transformer-based methods face computational inefficiencies in modeling global dependencies. The author proposes VSS-SpatioNet, a lightweight architecture that replaces self-attention in Transformers with a Visual State Space (VSS) module for efficient dependency modeling. The framework employs an asymmetric encoder-decoder with a multi-scale autoencoder and a novel VSS-Spatial (VS) fusion block for local-global feature integration. Evaluations on TNO, Harvard Medical, and RoadScene datasets demonstrate superior performance. On TNO, VSS-SpatioNet achieves state-of-the-art Entropy (En = 7.0058) and Mutual Information (MI = 14.0116), outperforming 12 benchmark methods. For RoadScene, it attains gradient-based fusion performance ([Formula: see text]=0.5712), Piella's metric ([Formula: see text]=0.7926), and average gradient (AG = 5.2994), surpassing prior works. On Harvard Medical, the VS strategy improves Mean Gradient by 18.7% (0.0224 vs. 0.0198) against FusionGAN, validating enhanced feature preservation. Results confirm the framework's efficacy in medical applications, particularly precise tissue characterization.

摘要

红外与可见光图像融合（vis-ir）可提高医学成像和生物分析中的诊断准确性。现有的基于卷积神经网络（CNN）和基于Transformer的方法在对全局依赖性进行建模时面临计算效率低下的问题。作者提出了VSS-SpatioNet，这是一种轻量级架构，它用视觉状态空间（VSS）模块取代了Transformer中的自注意力机制，以实现高效的依赖性建模。该框架采用了具有多尺度自动编码器的非对称编码器-解码器以及用于局部-全局特征整合的新型VSS-空间（VS）融合块。在TNO、哈佛医学和道路场景数据集上的评估显示出卓越的性能。在TNO数据集上，VSS-SpatioNet实现了最优的熵（En = 7.0058）和互信息（MI = 14.0116），优于12种基准方法。对于道路场景数据集，它获得了基于梯度的融合性能（[公式：见原文]=0.5712）、皮埃拉度量（[公式：见原文]=0.7926）和平均梯度（AG = 5.2994），超过了先前的工作。在哈佛医学数据集上，VS策略相对于FusionGAN将平均梯度提高了18.7%（0.0224对0.0198），验证了增强的特征保留能力。结果证实了该框架在医学应用中的有效性，特别是在精确的组织特征描述方面。