基于深度学习的图像对齐以从图像中提取深度特征信息。

Image Alignment Based on Deep Learning to Extract Deep Feature Information from Images.

作者信息

Zhu Lin, Mao Yuxing, Pan Jianyu

机构信息

State Key Laboratory of Power Transmission Equipment Technology, School of Electrical Engineering, Chongqing University, Chongqing 400044, China.

出版信息

Sensors (Basel). 2025 Jul 26;25(15):4628. doi: 10.3390/s25154628.

DOI:10.3390/s25154628

PMID:40807794

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12349225/

Abstract

To overcome the limitations of traditional image alignment methods in capturing deep semantic features, a deep feature information image alignment network (DFA-Net) is proposed. This network aims to enhance image alignment performance through multi-level feature learning. DFA-Net is based on the deep residual architecture and introduces spatial pyramid pooling to achieve cross-scalar feature fusion, effectively enhancing the feature's adaptability to scale. A feature enhancement module based on the self-attention mechanism is designed, with key features that exhibit geometric invariance and high discriminative power, achieved through a dynamic weight allocation strategy. This improves the network's robustness to multimodal image deformation. Experiments on two public datasets, MSRS and RoadScene, show that the method performs well in terms of alignment accuracy, with the RMSE metrics being reduced by 0.661 and 0.473, and the SSIM, MI, and NCC improved by 0.155, 0.163, and 0.211; and 0.108, 0.226, and 0.114, respectively, compared with the benchmark model. The visualization results validate the significant improvement in the features' visual quality and confirm the method's advantages in terms of stability and discriminative properties of deep feature extraction.

摘要

为了克服传统图像对齐方法在捕捉深度语义特征方面的局限性，提出了一种深度特征信息图像对齐网络（DFA-Net）。该网络旨在通过多级特征学习来提高图像对齐性能。DFA-Net基于深度残差架构，并引入空间金字塔池化以实现跨尺度特征融合，有效地增强了特征对尺度的适应性。设计了一种基于自注意力机制的特征增强模块，其关键特征通过动态权重分配策略表现出几何不变性和高判别力。这提高了网络对多模态图像变形的鲁棒性。在两个公共数据集MSRS和RoadScene上的实验表明，该方法在对齐精度方面表现良好，与基准模型相比，RMSE指标分别降低了0.661和0.473，SSIM、MI和NCC分别提高了0.155、0.163和0.211；以及0.108、0.226和0.114。可视化结果验证了特征视觉质量的显著提高，并证实了该方法在深度特征提取的稳定性和判别特性方面的优势。