• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CFANet:用于室内RGB-D语义分割的跨模态融合注意力网络

CFANet: The Cross-Modal Fusion Attention Network for Indoor RGB-D Semantic Segmentation.

作者信息

Wu Long-Fei, Wei Dan, Xu Chang-An

机构信息

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China.

College of Materials and Energy, South China Agricultural University, Guangzhou 510642, China.

出版信息

J Imaging. 2025 May 27;11(6):177. doi: 10.3390/jimaging11060177.

DOI:10.3390/jimaging11060177
PMID:40558775
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12194209/
Abstract

Indoor image semantic segmentation technology is applied to fields such as smart homes and indoor security. The challenges faced by semantic segmentation techniques using RGB images and depth maps as data sources include the semantic gap between RGB images and depth maps and the loss of detailed information. To address these issues, a multi-head self-attention mechanism is adopted to adaptively align features of the two modalities and perform feature fusion in both spatial and channel dimensions. Appropriate feature extraction methods are designed according to the different characteristics of RGB images and depth maps. For RGB images, asymmetric convolution is introduced to capture features in the horizontal and vertical directions, enhance short-range information dependence, mitigate the gridding effect of dilated convolution, and introduce criss-cross attention to obtain contextual information from global dependency relationships. On the depth map, a strategy of extracting significant unimodal features from the channel and spatial dimensions is used. A lightweight skip connection module is designed to fuse low-level and high-level features. In addition, since the first layer contains the richest detailed information and the last layer contains rich semantic information, a feature refinement head is designed to fuse the two. The method achieves an mIoU of 53.86% and 51.85% on the NYUDv2 and SUN-RGBD datasets, which is superior to mainstream methods.

摘要

室内图像语义分割技术应用于智能家居和室内安全等领域。以RGB图像和深度图作为数据源的语义分割技术面临的挑战包括RGB图像和深度图之间的语义鸿沟以及详细信息的丢失。为了解决这些问题,采用多头自注意力机制来自适应地对齐两种模态的特征,并在空间和通道维度上进行特征融合。根据RGB图像和深度图的不同特性设计了合适的特征提取方法。对于RGB图像,引入非对称卷积来捕捉水平和垂直方向的特征,增强短距离信息依赖性,减轻空洞卷积的网格效应,并引入十字交叉注意力以从全局依赖关系中获取上下文信息。在深度图上,采用从通道和空间维度提取显著单峰特征的策略。设计了一个轻量级跳跃连接模块来融合低级和高级特征。此外,由于第一层包含最丰富的详细信息,最后一层包含丰富的语义信息,设计了一个特征细化头来融合两者。该方法在NYUDv2和SUN-RGBD数据集上分别达到了53.86%和51.85%的平均交并比,优于主流方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/b732084c260e/jimaging-11-00177-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/8b0399af6864/jimaging-11-00177-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/1158a1a9fe0a/jimaging-11-00177-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/8bcf9a7e7ead/jimaging-11-00177-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/096f2f105481/jimaging-11-00177-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/4f9d0c923f30/jimaging-11-00177-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/023b58694403/jimaging-11-00177-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/c32c7b4bcdd6/jimaging-11-00177-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/0fdf9976339d/jimaging-11-00177-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/b732084c260e/jimaging-11-00177-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/8b0399af6864/jimaging-11-00177-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/1158a1a9fe0a/jimaging-11-00177-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/8bcf9a7e7ead/jimaging-11-00177-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/096f2f105481/jimaging-11-00177-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/4f9d0c923f30/jimaging-11-00177-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/023b58694403/jimaging-11-00177-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/c32c7b4bcdd6/jimaging-11-00177-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/0fdf9976339d/jimaging-11-00177-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/641a/12194209/b732084c260e/jimaging-11-00177-g010.jpg

相似文献

1
CFANet: The Cross-Modal Fusion Attention Network for Indoor RGB-D Semantic Segmentation.CFANet:用于室内RGB-D语义分割的跨模态融合注意力网络
J Imaging. 2025 May 27;11(6):177. doi: 10.3390/jimaging11060177.
2
DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation.DGCFNet:用于遥感图像语义分割的双全局上下文融合网络
PeerJ Comput Sci. 2025 Mar 27;11:e2786. doi: 10.7717/peerj-cs.2786. eCollection 2025.
3
Liver Semantic Segmentation Method Based on Multi-Channel Feature Extraction and Cross Fusion.基于多通道特征提取与交叉融合的肝脏语义分割方法
Bioengineering (Basel). 2025 Jun 11;12(6):636. doi: 10.3390/bioengineering12060636.
4
MACCoM: A multiple attention and convolutional cross-mixer framework for detailed 2D biomedical image segmentation.MACCoM:用于详细 2D 生物医学图像分割的多注意和卷积交叉混合器框架。
Comput Biol Med. 2024 Sep;179:108847. doi: 10.1016/j.compbiomed.2024.108847. Epub 2024 Jul 15.
5
TLTNet: A novel transscale cascade layered transformer network for enhanced retinal blood vessel segmentation.TLTNet:一种新颖的跨尺度级联分层Transformer 网络,用于增强视网膜血管分割。
Comput Biol Med. 2024 Aug;178:108773. doi: 10.1016/j.compbiomed.2024.108773. Epub 2024 Jun 25.
6
Prediction of Alzheimer's Disease Based on Multi-Modal Domain Adaptation.基于多模态域适应的阿尔茨海默病预测
Brain Sci. 2025 Jun 7;15(6):618. doi: 10.3390/brainsci15060618.
7
Lightweight 2D Medical Image Segmentation via a Decoder Using Linear Deformable Convolution and Multi-scale Self-attention.通过使用线性可变形卷积和多尺度自注意力的解码器实现的轻量级二维医学图像分割
IEEE J Biomed Health Inform. 2025 Jun 25;PP. doi: 10.1109/JBHI.2025.3583108.
8
CDFAN: Cross-Domain Fusion Attention Network for Pansharpening.CDFAN:用于图像锐化的跨域融合注意力网络。
Entropy (Basel). 2025 May 27;27(6):567. doi: 10.3390/e27060567.
9
GaitCSF: Multi-Modal Gait Recognition Network Based on Channel Shuffle Regulation and Spatial-Frequency Joint Learning.步态脑脊液:基于通道混洗调节和空间频率联合学习的多模态步态识别网络。
Sensors (Basel). 2025 Jun 16;25(12):3759. doi: 10.3390/s25123759.
10
Multi-class segmentation of knee MRI based on hybrid attention.基于混合注意力的膝关节磁共振成像多类别分割
Front Med (Lausanne). 2025 Jun 11;12:1581487. doi: 10.3389/fmed.2025.1581487. eCollection 2025.

本文引用的文献

1
AESeg: Affinity-enhanced segmenter using feature class mapping knowledge distillation for efficient RGB-D semantic segmentation of indoor scenes.AESeg:使用特征类映射知识蒸馏的亲和力增强分割器,用于室内场景的高效RGB-D语义分割。
Neural Netw. 2025 Aug;188:107438. doi: 10.1016/j.neunet.2025.107438. Epub 2025 Mar 25.
2
Enhanced CATBraTS for Brain Tumour Semantic Segmentation.用于脑肿瘤语义分割的增强型CATBraTS
J Imaging. 2025 Jan 3;11(1):8. doi: 10.3390/jimaging11010008.
3
Efficient sub-pixel convolutional neural network for terahertz image super-resolution.
用于太赫兹图像超分辨率的高效亚像素卷积神经网络。
Opt Lett. 2022 Jun 15;47(12):3115-3118. doi: 10.1364/OL.454267.
4
SA-Net: A scale-attention network for medical image segmentation.SA-Net:一种用于医学图像分割的尺度注意力网络。
PLoS One. 2021 Apr 14;16(4):e0247388. doi: 10.1371/journal.pone.0247388. eCollection 2021.
5
Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation.用于实时RGBD语义分割的空间信息引导卷积
IEEE Trans Image Process. 2021;30:2313-2324. doi: 10.1109/TIP.2021.3049332. Epub 2021 Jan 27.
6
Semantic Segmentation with Context Encoding and Multi-Path Decoding.基于上下文编码和多路径解码的语义分割
IEEE Trans Image Process. 2020 Jan 9. doi: 10.1109/TIP.2019.2962685.
7
SCN: Switchable Context Network for Semantic Segmentation of RGB-D Images.SCN:用于 RGB-D 图像语义分割的可切换上下文网络。
IEEE Trans Cybern. 2020 Mar;50(3):1120-1131. doi: 10.1109/TCYB.2018.2885062. Epub 2018 Dec 20.
8
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
9
Fast Feature Pyramids for Object Detection.快速目标检测特征金字塔。
IEEE Trans Pattern Anal Mach Intell. 2014 Aug;36(8):1532-45. doi: 10.1109/TPAMI.2014.2300479.