• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于双向激光雷达-相机语义分割的单模态到多模态知识蒸馏

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation.

作者信息

Sun Tianfang, Zhang Zhizhong, Tan Xin, Peng Yong, Qu Yanyun, Xie Yuan

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11059-11072. doi: 10.1109/TPAMI.2024.3451658. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3451658
PMID:39208046
Abstract

Combining LiDAR points and images for robust semantic segmentation has shown great potential. However, the heterogeneity between the two modalities (e.g. the density, the field of view) poses challenges in establishing a bijective mapping between each point and pixel. This modality alignment problem introduces new challenges in network design and data processing for cross-modal methods. Specifically, 1) points that are projected outside the image planes; 2) the complexity of maintaining geometric consistency limits the deployment of many data augmentation techniques. To address these challenges, we propose a cross-modal knowledge imputation and transition approach. First, we introduce a bidirectional feature fusion strategy that imputes missing image features and performs cross-modal fusion simultaneously. This allows us to generate reliable predictions even when images are missing. Second, we propose a Uni-to-Multi modal Knowledge Distillation (U2MKD) framework, leveraging the transfer of informative features from a single-modality teacher to a cross-modality student. This overcomes the issues of augmentation misalignment and enables us to train the student effectively. Extensive experiments on the nuScenes, Waymo, and SemanticKITTI datasets demonstrate the effectiveness of our approach. Notably, our method achieves an 8.3 mIoU gain over the LiDAR-only baseline on the nuScenes validation set and achieves state-of-the-art performance on the three datasets.

摘要

将激光雷达点云和图像相结合以进行稳健的语义分割已显示出巨大潜力。然而,这两种模态之间的异质性(例如密度、视野)在建立每个点与像素之间的双射映射时带来了挑战。这种模态对齐问题在跨模态方法的网络设计和数据处理中引入了新的挑战。具体而言,1)投影到图像平面之外的点;2)维持几何一致性的复杂性限制了许多数据增强技术的应用。为应对这些挑战,我们提出了一种跨模态知识插补与转换方法。首先,我们引入了一种双向特征融合策略,该策略可插补缺失的图像特征并同时执行跨模态融合。这使我们即使在图像缺失时也能生成可靠的预测。其次,我们提出了一个单模态到多模态知识蒸馏(U2MKD)框架,利用从单模态教师到跨模态学生的信息性特征转移。这克服了增强对齐问题,并使我们能够有效地训练学生。在nuScenes、Waymo和SemanticKITTI数据集上进行的大量实验证明了我们方法的有效性。值得注意的是,我们的方法在nuScenes验证集上比仅使用激光雷达的基线提高了8.3的平均交并比,并在这三个数据集上取得了领先的性能。

相似文献

1
Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation.用于双向激光雷达-相机语义分割的单模态到多模态知识蒸馏
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11059-11072. doi: 10.1109/TPAMI.2024.3451658. Epub 2024 Nov 6.
2
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
3
MHD-Net: Memory-Aware Hetero-Modal Distillation Network for Thymic Epithelial Tumor Typing With Missing Pathology Modality.MHD-Net:具有缺失病理模态的胸腺瘤分型的记忆感知异模态蒸馏网络。
IEEE J Biomed Health Inform. 2024 May;28(5):3003-3014. doi: 10.1109/JBHI.2024.3376462. Epub 2024 May 6.
4
EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation.EPMF:用于3D语义分割的高效感知感知多传感器融合
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8258-8273. doi: 10.1109/TPAMI.2024.3402232. Epub 2024 Nov 6.
5
A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.一种使用变换和卷积的 3D 层次跨模态交互网络,用于磁共振图像中的脑胶质瘤分割。
Med Phys. 2024 Nov;51(11):8371-8389. doi: 10.1002/mp.17354. Epub 2024 Aug 13.
6
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception.用于基于激光雷达感知的圆柱形和非对称3D卷积网络
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6807-6822. doi: 10.1109/TPAMI.2021.3098789. Epub 2022 Sep 14.
7
PTA-Det: Point Transformer Associating Point Cloud and Image for 3D Object Detection.PTA-Det:用于 3D 目标检测的点变换关联点云和图像。
Sensors (Basel). 2023 Mar 17;23(6):3229. doi: 10.3390/s23063229.
8
CBG-Net: Cross-modality and cross-scale balance network with global semantics for multi-modal 3D object detection.CBG-Net:用于多模态 3D 目标检测的具有全局语义的跨模态和跨尺度平衡网络。
Neural Netw. 2024 Nov;179:106535. doi: 10.1016/j.neunet.2024.106535. Epub 2024 Jul 14.
9
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
10
FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation.FCKDNet:一种用于语义分割的特征压缩知识蒸馏网络。
Entropy (Basel). 2023 Jan 7;25(1):125. doi: 10.3390/e25010125.

引用本文的文献

1
Counterclockwise block-by-block knowledge distillation for neural network compression.用于神经网络压缩的逆时针逐块知识蒸馏
Sci Rep. 2025 Apr 3;15(1):11369. doi: 10.1038/s41598-025-91152-3.