• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于增强语义对应中几何感知的深度感知与可学习特征融合网络。

A Depth Awareness and Learnable Feature Fusion Network for Enhanced Geometric Perception in Semantic Correspondence.

作者信息

Li Fazeng, Zou Chunlong, Yun Juntong, Huang Li, Liu Ying, Tao Bo, Xie Yuanmin

机构信息

Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China.

College of Mechanical Engineering, Hubei University of Automotive Technology, Shiyan 442000, China.

出版信息

Sensors (Basel). 2024 Oct 17;24(20):6680. doi: 10.3390/s24206680.

DOI:10.3390/s24206680
PMID:39460160
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11511390/
Abstract

Deep learning is becoming the most widely used technology for multi-sensor data fusion. Semantic correspondence has recently emerged as a foundational task, enabling a range of downstream applications, such as style or appearance transfer, robot manipulation, and pose estimation, through its ability to provide robust correspondence in RGB images with semantic information. However, current representations generated by self-supervised learning and generative models are often limited in their ability to capture and understand the geometric structure of objects, which is significant for matching the correct details in applications of semantic correspondence. Furthermore, efficiently fusing these two types of features presents an interesting challenge. Achieving harmonious integration of these features is crucial for improving the expressive power of models in various tasks. To tackle these issues, our key idea is to integrate depth information from depth estimation or depth sensors into feature maps and leverage learnable weights for feature fusion. First, depth information is used to model pixel-wise depth distributions, assigning relative depth weights to feature maps for perceiving an object's structural information. Then, based on a contrastive learning optimization objective, a series of weights are optimized to leverage feature maps from self-supervised learning and generative models. Depth features are naturally embedded into feature maps, guiding the network to learn geometric structure information about objects and alleviating depth ambiguity issues. Experiments on the SPair-71K and AP-10K datasets show that the proposed method achieves scores of 81.8 and 83.3 on the percentage of correct keypoints (PCK) at the 0.1 level, respectively. Our approach not only demonstrates significant advantages in experimental results but also introduces the depth awareness module and a learnable feature fusion module, which enhances the understanding of object structures through depth information and fully utilizes features from various pre-trained models, offering new possibilities for the application of deep learning in RGB and depth data fusion technologies. We will also continue to focus on accelerating model inference and optimizing model lightweighting, enabling our model to operate at a faster speed.

摘要

深度学习正成为多传感器数据融合中应用最广泛的技术。语义对应最近已成为一项基础任务,通过其在具有语义信息的RGB图像中提供稳健对应关系的能力,实现了一系列下游应用,如风格或外观迁移、机器人操纵和姿态估计。然而,目前由自监督学习和生成模型生成的表示在捕捉和理解物体几何结构方面的能力往往有限,这对于语义对应应用中匹配正确细节至关重要。此外,有效地融合这两种类型的特征带来了一个有趣的挑战。实现这些特征的和谐整合对于提高模型在各种任务中的表达能力至关重要。为了解决这些问题,我们的关键思想是将来自深度估计或深度传感器的深度信息整合到特征图中,并利用可学习权重进行特征融合。首先,深度信息用于对像素级深度分布进行建模,为特征图分配相对深度权重以感知物体的结构信息。然后,基于对比学习优化目标,优化一系列权重以利用来自自监督学习和生成模型的特征图。深度特征自然地嵌入到特征图中,引导网络学习物体的几何结构信息并缓解深度模糊问题。在SPair - 71K和AP - 10K数据集上的实验表明,所提出的方法在0.1水平的正确关键点百分比(PCK)上分别达到了81.8和83.3的分数。我们的方法不仅在实验结果中显示出显著优势,还引入了深度感知模块和可学习特征融合模块,通过深度信息增强了对物体结构的理解,并充分利用了各种预训练模型的特征,为深度学习在RGB和深度数据融合技术中的应用提供了新的可能性。我们还将继续专注于加速模型推理和优化模型轻量化,使我们的模型能够以更快的速度运行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/95a42dff65b4/sensors-24-06680-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/c2a974c6ba95/sensors-24-06680-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/0b68dee5a3f3/sensors-24-06680-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/2574b7a4d656/sensors-24-06680-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/df59cf6724c0/sensors-24-06680-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/5a7e33c60356/sensors-24-06680-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/98224c42dcae/sensors-24-06680-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/769f554cb601/sensors-24-06680-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/f78418d69c1a/sensors-24-06680-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/95a42dff65b4/sensors-24-06680-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/c2a974c6ba95/sensors-24-06680-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/0b68dee5a3f3/sensors-24-06680-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/2574b7a4d656/sensors-24-06680-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/df59cf6724c0/sensors-24-06680-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/5a7e33c60356/sensors-24-06680-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/98224c42dcae/sensors-24-06680-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/769f554cb601/sensors-24-06680-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/f78418d69c1a/sensors-24-06680-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4a/11511390/95a42dff65b4/sensors-24-06680-g009.jpg

相似文献

1
A Depth Awareness and Learnable Feature Fusion Network for Enhanced Geometric Perception in Semantic Correspondence.一种用于增强语义对应中几何感知的深度感知与可学习特征融合网络。
Sensors (Basel). 2024 Oct 17;24(20):6680. doi: 10.3390/s24206680.
2
SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection.SLMSF-Net:用于RGB-D显著目标检测的语义定位与多尺度融合网络
Sensors (Basel). 2024 Feb 8;24(4):1117. doi: 10.3390/s24041117.
3
Repeated Cross-Scale Structure-Induced Feature Fusion Network for 2D Hand Pose Estimation.用于二维手部姿态估计的重复跨尺度结构诱导特征融合网络
Entropy (Basel). 2023 Apr 27;25(5):724. doi: 10.3390/e25050724.
4
SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches.SFA-MDEN:基于语义特征辅助的双通道单目深度估计网络。
Sensors (Basel). 2021 Aug 13;21(16):5476. doi: 10.3390/s21165476.
5
MAFFNet: real-time multi-level attention feature fusion network with RGB-D semantic segmentation for autonomous driving.MAFFNet:用于自动驾驶的具有RGB-D语义分割的实时多级注意力特征融合网络
Appl Opt. 2022 Mar 20;61(9):2219-2229. doi: 10.1364/AO.449589.
6
An Interactive Image Segmentation Method Based on Multi-Level Semantic Fusion.一种基于多级语义融合的交互式图像分割方法。
Sensors (Basel). 2023 Jul 14;23(14):6394. doi: 10.3390/s23146394.
7
GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation.GMNet:用于RGB-热红外城市场景语义分割的分级特征多标签学习网络
IEEE Trans Image Process. 2021;30:7790-7802. doi: 10.1109/TIP.2021.3109518. Epub 2021 Sep 14.
8
Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection.基于绝对和相对深度信息的 RGB-D 显著目标检测网络
Sensors (Basel). 2023 Mar 30;23(7):3611. doi: 10.3390/s23073611.
9
Learning Semantic-Aware Local Features for Long Term Visual Localization.学习用于长期视觉定位的语义感知局部特征。
IEEE Trans Image Process. 2022;31:4842-4855. doi: 10.1109/TIP.2022.3187565. Epub 2022 Jul 20.
10
Geometric Boundary Guided Feature Fusion and Spatial-Semantic Context Aggregation for Semantic Segmentation of Remote Sensing Images.用于遥感图像语义分割的几何边界引导特征融合与空间语义上下文聚合
IEEE Trans Image Process. 2023;32:6373-6385. doi: 10.1109/TIP.2023.3326400. Epub 2023 Nov 28.

本文引用的文献

1
RCRFNet: Enhancing Object Detection with Self-Supervised Radar-Camera Fusion and Open-Set Recognition.RCRFNet:通过自监督雷达-相机融合和开放集识别增强目标检测
Sensors (Basel). 2024 Jul 24;24(15):4803. doi: 10.3390/s24154803.
2
Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR.
Sensors (Basel). 2024 Jul 22;24(14):4756. doi: 10.3390/s24144756.
3
BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera.BAFusion:基于激光雷达和摄像头的用于3D目标检测的双向注意力融合
Sensors (Basel). 2024 Jul 20;24(14):4718. doi: 10.3390/s24144718.
4
Neural Colour Correction for Indoor 3D Reconstruction Using RGB-D Data.使用RGB-D数据进行室内3D重建的神经色彩校正
Sensors (Basel). 2024 Jun 26;24(13):4141. doi: 10.3390/s24134141.
5
Diffusion Models in Vision: A Survey.视觉中的扩散模型:综述
IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10850-10869. doi: 10.1109/TPAMI.2023.3261988. Epub 2023 Aug 7.
6
CATs++: Boosting Cost Aggregation With Convolutions and Transformers.CATs++:通过卷积和转换器增强成本聚合。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7174-7194. doi: 10.1109/TPAMI.2022.3218727. Epub 2023 May 5.
7
Leveraging Geometric Structure for Label-Efficient Semi-Supervised Scene Segmentation.利用几何结构进行标签高效的半监督场景分割。
IEEE Trans Image Process. 2022;31:6320-6330. doi: 10.1109/TIP.2022.3208735. Epub 2022 Oct 10.
8
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.迈向稳健的单目深度估计:混合数据集以实现零样本跨数据集迁移。
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1623-1637. doi: 10.1109/TPAMI.2020.3019967. Epub 2022 Feb 3.
9
SIFT flow: dense correspondence across scenes and its applications.SIFT 流:跨越场景的密集对应及其应用。
IEEE Trans Pattern Anal Mach Intell. 2011 May;33(5):978-94. doi: 10.1109/TPAMI.2010.147.