Liu Risheng, Liu Zhu, Liu Jinyuan, Fan Xin, Luo Zhongxuan
IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6594-6609. doi: 10.1109/TPAMI.2024.3382308. Epub 2024 Sep 5.
Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM.
图像融合在各种基于多传感器的视觉系统中起着关键作用,特别是在提高视觉质量和/或提取用于感知的聚合特征方面。然而,大多数现有方法仅将图像融合视为一项单独的任务,从而忽略了它与这些下游视觉问题的潜在关系。此外,设计合适的融合架构通常需要大量的工程劳动。它还缺乏提高当前融合方法的灵活性和泛化能力的机制。为了缓解这些问题,我们建立了一个任务引导、隐式搜索和元初始化(TIM)深度模型,以解决具有挑战性的现实场景中的图像融合问题。具体来说,我们首先提出一种约束策略,将来自下游任务的信息纳入其中,以指导图像融合的无监督学习过程。在此框架内,我们接着设计一种隐式搜索方案,以高效地自动发现我们融合模型的紧凑架构。此外,引入了一种 pretext 元初始化技术,以利用差异融合数据来支持对不同类型图像融合任务的快速适应。在不同类别的图像融合问题和相关下游任务(例如视觉增强和语义理解)上的定性和定量实验结果证实了我们的 TIM 的灵活性和有效性。