• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

协同训练在深度目标检测中的应用:单模态与多模态方法比较。

Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.

机构信息

Computer Vision Center (CVC), Universitat Autònoma de Barcelona (UAB), 08193 Bellaterra, Spain.

Computer Science Department, Universitat Autònoma de Barcelona (UAB), 08193 Bellaterra, Spain.

出版信息

Sensors (Basel). 2021 May 4;21(9):3185. doi: 10.3390/s21093185.

DOI:10.3390/s21093185
PMID:34064323
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8125436/
Abstract

Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images.

摘要

表现最佳的计算机视觉模型是由卷积神经网络(CNN)驱动的。训练一个准确的 CNN 高度依赖于原始传感器数据及其相关的地面实况(GT)。此类 GT 的收集通常是通过人工标注完成的,这既耗时又无法满足我们的期望。由于图像传感器之间存在域转移,这种数据标注瓶颈可能会加剧,这可能需要对每个传感器的数据进行标注。在本文中,我们专注于使用协同训练(一种半监督学习(SSL)方法)来获得自我标注的目标边界框(BB),即训练深度目标检测器的 GT。具体来说,我们通过依赖图像的两种不同视图,即外观(RGB)和估计的深度(D),来评估多模态协同训练的效果。此外,我们比较了基于外观的单模态协同训练和多模态协同训练。我们的结果表明,在标准的 SSL 设置(无域转移,少量人工标注数据)和虚拟到真实的域转移(许多虚拟世界标注数据,无人工标注数据)下,多模态协同训练优于单模态协同训练。在后一种情况下,通过执行基于 GAN 的域转换,两种协同训练模式都具有可比性,至少在使用未专门针对转换图像进行训练的现成深度估计模型时是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/858e0e59168e/sensors-21-03185-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/cf568d020d5b/sensors-21-03185-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/71e641b22027/sensors-21-03185-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/01e2742aa4e6/sensors-21-03185-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/998787125b50/sensors-21-03185-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/5e61bf7a3bf8/sensors-21-03185-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/2ffff59d6954/sensors-21-03185-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/e26d9fd1018f/sensors-21-03185-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/858e0e59168e/sensors-21-03185-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/cf568d020d5b/sensors-21-03185-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/71e641b22027/sensors-21-03185-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/01e2742aa4e6/sensors-21-03185-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/998787125b50/sensors-21-03185-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/5e61bf7a3bf8/sensors-21-03185-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/2ffff59d6954/sensors-21-03185-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/e26d9fd1018f/sensors-21-03185-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/915a/8125436/858e0e59168e/sensors-21-03185-g008.jpg

相似文献

1
Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.协同训练在深度目标检测中的应用:单模态与多模态方法比较。
Sensors (Basel). 2021 May 4;21(9):3185. doi: 10.3390/s21093185.
2
Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images.基于RGB-D图像的多模态深度学习用于麦田杂草检测
Front Plant Sci. 2021 Nov 5;12:732968. doi: 10.3389/fpls.2021.732968. eCollection 2021.
3
RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory.基于多模态深度神经网络和证据理论的 RGB-D 目标识别。
Sensors (Basel). 2019 Jan 27;19(3):529. doi: 10.3390/s19030529.
4
Co-Labeling for Multi-View Weakly Labeled Learning.多视图弱标签学习的联合标记。
IEEE Trans Pattern Anal Mach Intell. 2016 Jun;38(6):1113-25. doi: 10.1109/TPAMI.2015.2476813. Epub 2015 Sep 4.
5
DeepFruits: A Fruit Detection System Using Deep Neural Networks.深度水果:一种使用深度神经网络的水果检测系统。
Sensors (Basel). 2016 Aug 3;16(8):1222. doi: 10.3390/s16081222.
6
A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling.一种用于RGB-D目标标注的多模态、判别式且空间不变的卷积神经网络。
IEEE Trans Pattern Anal Mach Intell. 2018 Sep;40(9):2051-2065. doi: 10.1109/TPAMI.2017.2747134. Epub 2017 Aug 30.
7
Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection.用于 RGB-D 显著目标检测的判别式跨模态迁移学习和密集跨层反馈融合。
IEEE Trans Cybern. 2020 Nov;50(11):4808-4820. doi: 10.1109/TCYB.2019.2934986. Epub 2019 Aug 30.
8
Bifurcated Backbone Strategy for RGB-D Salient Object Detection.用于 RGB-D 显著目标检测的分叉骨干策略。
IEEE Trans Image Process. 2021;30:8727-8742. doi: 10.1109/TIP.2021.3116793. Epub 2021 Oct 26.
9
Weakly Aligned Feature Fusion for Multimodal Object Detection.用于多模态目标检测的弱对齐特征融合
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4145-4159. doi: 10.1109/TNNLS.2021.3105143. Epub 2025 Feb 28.
10
Co-trained convolutional neural networks for automated detection of prostate cancer in multi-parametric MRI.基于多参数 MRI 的协同训练卷积神经网络在前列腺癌自动检测中的应用
Med Image Anal. 2017 Dec;42:212-227. doi: 10.1016/j.media.2017.08.006. Epub 2017 Aug 24.

引用本文的文献

1
Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models.用于语义分割模型无监督域适应的协同训练
Sensors (Basel). 2023 Jan 5;23(2):621. doi: 10.3390/s23020621.
2
A Cotraining-Based Semisupervised Approach for Remaining-Useful-Life Prediction of Bearings.基于协同训练的轴承剩余使用寿命预测半监督方法。
Sensors (Basel). 2022 Oct 13;22(20):7766. doi: 10.3390/s22207766.

本文引用的文献

1
Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization.基于使用交叉熵正则化的自监督模型的视觉目标识别协同训练
Entropy (Basel). 2021 Apr 1;23(4):423. doi: 10.3390/e23040423.
2
Generating Accurate Pseudo-labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations.通过埃尔米特多项式激活在半监督学习中生成准确的伪标签并避免过度自信的预测。
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2020 Jun;2020:11432-11440. doi: 10.1109/cvpr42600.2020.01145. Epub 2020 Aug 5.
3
Recognition and Localization Methods for Vision-Based Fruit Picking Robots: A Review.
基于视觉的水果采摘机器人的识别与定位方法:综述
Front Plant Sci. 2020 May 19;11:510. doi: 10.3389/fpls.2020.00510. eCollection 2020.
4
Deep High-Resolution Representation Learning for Visual Recognition.用于视觉识别的深度高分辨率表征学习
IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3349-3364. doi: 10.1109/TPAMI.2020.2983686. Epub 2021 Sep 2.