• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MGFusion:一种用于红外与可见光图像融合的多模态大语言模型引导的信息感知方法

MGFusion: a multimodal large language model-guided information perception for infrared and visible image fusion.

作者信息

Yang Zengyi, Li Yunping, Tang Xin, Xie MingHong

机构信息

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China.

Kunming Cigarette Factory, Hongyunhonghe Tobacco Group Company Limited, Kunming, Yunnan, China.

出版信息

Front Neurorobot. 2024 Dec 23;18:1521603. doi: 10.3389/fnbot.2024.1521603. eCollection 2024.

DOI:10.3389/fnbot.2024.1521603
PMID:39764200
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11700996/
Abstract

Existing image fusion methods primarily focus on complex network structure designs while neglecting the limitations of simple fusion strategies in complex scenarios. To address this issue, this study proposes a new method for infrared and visible image fusion based on a multimodal large language model. The method proposed in this paper fully considers the high demand for semantic information in enhancing image quality as well as the fusion strategies in complex scenes. We supplement the features in the fusion network with information from the multimodal large language model and construct a new fusion strategy. To achieve this goal, we design CLIP-driven Information Injection (CII) approach and CLIP-guided Feature Fusion (CFF) strategy. CII utilizes CLIP to extract robust image features rich in semantic information, which serve to supplement the information of infrared and visible features, thereby enhancing their representation capabilities for the scene. CFF further utilizes the robust image features extracted by CLIP to select and fuse the infrared and visible features after the injection of semantic information, addressing the challenges of image fusion in complex scenes. Compared to existing methods, the main advantage of the proposed method lies in leveraging the powerful semantic understanding capabilities of the multimodal large language model to supplement information for infrared and visible features, thus avoiding the need for complex network structure designs. Experimental results on multiple public datasets validate the effectiveness and superiority of the proposed method.

摘要

现有的图像融合方法主要集中在复杂的网络结构设计上,而忽略了简单融合策略在复杂场景中的局限性。为了解决这个问题,本研究提出了一种基于多模态大语言模型的红外与可见光图像融合新方法。本文提出的方法充分考虑了在提高图像质量方面对语义信息的高要求以及复杂场景中的融合策略。我们用来自多模态大语言模型的信息补充融合网络中的特征,并构建一种新的融合策略。为实现这一目标,我们设计了CLIP驱动的信息注入(CII)方法和CLIP引导的特征融合(CFF)策略。CII利用CLIP提取富含语义信息的鲁棒图像特征,用于补充红外和可见光特征的信息,从而增强它们对场景的表示能力。CFF进一步利用CLIP提取的鲁棒图像特征,在注入语义信息后选择并融合红外和可见光特征,解决复杂场景中的图像融合挑战。与现有方法相比,所提方法的主要优势在于利用多模态大语言模型强大的语义理解能力为红外和可见光特征补充信息,从而避免了复杂网络结构设计的需求。在多个公共数据集上的实验结果验证了所提方法的有效性和优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/dfce6a7e465c/fnbot-18-1521603-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/a70b1ebead65/fnbot-18-1521603-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/eb411738198a/fnbot-18-1521603-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/2faba3b9bbdf/fnbot-18-1521603-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/6b54d12cd5f6/fnbot-18-1521603-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/dfce6a7e465c/fnbot-18-1521603-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/a70b1ebead65/fnbot-18-1521603-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/eb411738198a/fnbot-18-1521603-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/2faba3b9bbdf/fnbot-18-1521603-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/6b54d12cd5f6/fnbot-18-1521603-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/385e/11700996/dfce6a7e465c/fnbot-18-1521603-g0006.jpg

相似文献

1
MGFusion: a multimodal large language model-guided information perception for infrared and visible image fusion.MGFusion:一种用于红外与可见光图像融合的多模态大语言模型引导的信息感知方法
Front Neurorobot. 2024 Dec 23;18:1521603. doi: 10.3389/fnbot.2024.1521603. eCollection 2024.
2
SIFusion: Lightweight infrared and visible image fusion based on semantic injection.SIFusion:基于语义注入的轻量级红外与可见光图像融合。
PLoS One. 2024 Nov 6;19(11):e0307236. doi: 10.1371/journal.pone.0307236. eCollection 2024.
3
A multibranch and multiscale neural network based on semantic perception for multimodal medical image fusion.基于语义感知的多分支多尺度神经网络用于多模态医学图像融合。
Sci Rep. 2024 Jul 30;14(1):17609. doi: 10.1038/s41598-024-68183-3.
4
Multi-scene image fusion via memory aware synapses.通过记忆感知突触实现多场景图像融合。
Sci Rep. 2025 Apr 24;15(1):14280. doi: 10.1038/s41598-025-88261-4.
5
A Generative Adversarial Network for Infrared and Visible Image Fusion Based on Semantic Segmentation.基于语义分割的红外与可见光图像融合生成对抗网络
Entropy (Basel). 2021 Mar 21;23(3):376. doi: 10.3390/e23030376.
6
DTFusion: Infrared and Visible Image Fusion Based on Dense Residual PConv-ConvNeXt and Texture-Contrast Compensation.DTFusion:基于密集残差PConv-ConvNeXt和纹理对比度补偿的红外与可见光图像融合
Sensors (Basel). 2023 Dec 29;24(1):203. doi: 10.3390/s24010203.
7
DSA-Net: Infrared and Visible Image Fusion via Dual-Stream Asymmetric Network.DSA-Net:通过双流非对称网络实现红外与可见光图像融合
Sensors (Basel). 2023 Aug 11;23(16):7097. doi: 10.3390/s23167097.
8
MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion.MEEAFusion:基于多尺度边缘增强和联合注意力机制的红外与可见光图像融合
Sensors (Basel). 2024 Sep 9;24(17):5860. doi: 10.3390/s24175860.
9
HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion.HDCTfusion:基于卷积神经网络(CNN)和Transformer的混合双分支网络用于红外与可见光图像融合
Sensors (Basel). 2024 Dec 3;24(23):7729. doi: 10.3390/s24237729.
10
Infrared-Visible Image Fusion Based on Semantic Guidance and Visual Perception.基于语义引导和视觉感知的红外-可见光图像融合
Entropy (Basel). 2022 Sep 21;24(10):1327. doi: 10.3390/e24101327.

本文引用的文献

1
A Task-Guided, Implicitly-Searched and Meta-Initialized Deep Model for Image Fusion.一种用于图像融合的任务引导、隐式搜索和元初始化深度模型。
IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6594-6609. doi: 10.1109/TPAMI.2024.3382308. Epub 2024 Sep 5.
2
Focus Affinity Perception and Super-Resolution Embedding for Multifocus Image Fusion.用于多聚焦图像融合的聚焦亲和力感知与超分辨率嵌入
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4311-4325. doi: 10.1109/TNNLS.2024.3367782. Epub 2025 Feb 28.
3
Rethinking the Effectiveness of Objective Evaluation Metrics in Multi-Focus Image Fusion: A Statistic-Based Approach.
重新思考多聚焦图像融合中客观评估指标的有效性:一种基于统计的方法。
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5806-5819. doi: 10.1109/TPAMI.2024.3367905. Epub 2024 Jul 2.
4
Dif-Fusion: Toward High Color Fidelity in Infrared and Visible Image Fusion With Diffusion Models.扩散:利用扩散模型实现红外与可见光图像融合中的高色彩保真度
IEEE Trans Image Process. 2023;32:5705-5720. doi: 10.1109/TIP.2023.3322046. Epub 2023 Oct 24.
5
MURF: Mutually Reinforcing Multi-Modal Image Registration and Fusion.MURF:相互增强的多模态图像配准与融合
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12148-12166. doi: 10.1109/TPAMI.2023.3283682. Epub 2023 Sep 5.
6
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images.LRRNet:一种用于红外与可见光图像的新型表示学习引导融合网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):11040-11052. doi: 10.1109/TPAMI.2023.3268209. Epub 2023 Aug 7.
7
Different Input Resolutions and Arbitrary Output Resolution: A Meta Learning-Based Deep Framework for Infrared and Visible Image Fusion.不同输入分辨率与任意输出分辨率:一种基于元学习的红外与可见光图像融合深度框架
IEEE Trans Image Process. 2021;30:4070-4083. doi: 10.1109/TIP.2021.3069339. Epub 2021 Apr 7.
8
U2Fusion: A Unified Unsupervised Image Fusion Network.U2Fusion:一种统一的无监督图像融合网络。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):502-518. doi: 10.1109/TPAMI.2020.3012548. Epub 2021 Dec 8.
9
DDcGAN: A Dual-discriminator Conditional Generative Adversarial Network for Multi-resolution Image Fusion.DDcGAN:一种用于多分辨率图像融合的双判别器条件生成对抗网络。
IEEE Trans Image Process. 2020 Mar 10. doi: 10.1109/TIP.2020.2977573.
10
MDLatLRR: A novel decomposition method for infrared and visible image fusion.MDLatLRR:一种用于红外与可见光图像融合的新型分解方法。
IEEE Trans Image Process. 2020 Feb 28. doi: 10.1109/TIP.2020.2975984.