Vipparla Chandrakanth, Krock Timothy, Nouduri Koundinya, Fraser Joshua, AliAkbarpour Hadi, Sagan Vasit, Cheng Jing-Ru C, Kannappan Palaniappan
Department of Electrical and Computer Engineering, University of Missouri, Columbia, MO 65211, USA.
Department of Computer Science, Saint Louis University, St. Louis, MO 63103, USA.
Sensors (Basel). 2024 Dec 23;24(24):8217. doi: 10.3390/s24248217.
Multi-modal systems extract information about the environment using specialized sensors that are optimized based on the wavelength of the phenomenology and material interactions. To maximize the entropy, complementary systems operating in regions of non-overlapping wavelengths are optimal. VIS-IR (Visible-Infrared) systems have been at the forefront of multi-modal fusion research and are used extensively to represent information in all-day all-weather applications. Prior to image fusion, the image pairs have to be properly registered and mapped to a common resolution palette. However, due to differences in the device physics of image capture, information from VIS-IR sensors cannot be directly correlated, which is a major bottleneck for this area of research. In the absence of camera metadata, image registration is performed manually, which is not practical for large datasets. Most of the work published in this area assumes calibrated sensors and the availability of camera metadata providing registered image pairs, which limits the generalization capability of these systems. In this work, we propose a novel end-to-end pipeline termed for image registration and fusion. Firstly, we design a recursive crop and scale wavelet spectral decomposition (WSD) algorithm for automatically extracting the patch of visible data representing the thermal information. After data extraction, both the images are registered to a common resolution palette and forwarded to the DNN for image fusion. The fusion performance of the proposed pipeline is compared and quantified with state-of-the-art classical and DNN architectures for open-source and custom datasets demonstrating the efficacy of the pipeline. Furthermore, we also propose a novel keypoint-based metric for quantifying the quality of fused output.
多模态系统使用基于现象学波长和材料相互作用进行优化的专门传感器来提取有关环境的信息。为了最大化熵,在非重叠波长区域运行的互补系统是最佳的。可见-红外(VIS-IR)系统一直处于多模态融合研究的前沿,并广泛用于全天候应用中的信息表示。在进行图像融合之前,必须对图像对进行正确配准并映射到通用分辨率调色板。然而,由于图像捕获设备物理特性的差异,来自可见-红外传感器的信息无法直接关联,这是该研究领域的一个主要瓶颈。在没有相机元数据的情况下,图像配准是手动进行的,这对于大型数据集来说并不实用。该领域发表的大多数工作都假设传感器已校准且有提供配准图像对的相机元数据,这限制了这些系统的通用性。在这项工作中,我们提出了一种用于图像配准和融合的新颖端到端管道。首先,我们设计了一种递归裁剪和缩放小波谱分解(WSD)算法,用于自动提取表示热信息的可见数据块。数据提取后,将两幅图像配准到通用分辨率调色板,然后转发到深度神经网络进行图像融合。对于开源和自定义数据集,将所提出管道的融合性能与最先进的经典和深度神经网络架构进行比较和量化,证明了该管道的有效性。此外,我们还提出了一种基于关键点的新颖度量标准,用于量化融合输出的质量。