• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

面向视频目标检测的注意力引导解缠特征聚合。

Attention-Guided Disentangled Feature Aggregation for Video Object Detection.

机构信息

Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany.

Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany.

出版信息

Sensors (Basel). 2022 Nov 7;22(21):8583. doi: 10.3390/s22218583.

DOI:10.3390/s22218583
PMID:36366281
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9658927/
Abstract

Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.

摘要

目标检测是计算机视觉领域的一项任务,其涉及图像中目标的定位和分类。视频数据隐含着诸多挑战,例如模糊、遮挡和失焦,相较于在独立图像上进行的静态图像目标检测,视频目标检测更为复杂。本文通过提出一种基于 Faster R-CNN 架构的两级目标检测器,来解决这些挑战,该检测器基于注意力机制,对从各个帧中提取的解缠特征进行聚合。解缠头部集成了尺度、空间和任务感知注意力,并将其应用于骨干网络在所有帧中提取的特征。随后,聚合头部引入了时间注意力,并通过聚合支持帧的特征来提高目标帧中的检测效果,这些特征包括从解缠网络中提取的特征和时间特征。我们使用 ImageNet VID 数据集评估了所提出的框架,分别使用 ResNet-50 和 ResNet-101 作为骨干网络,其平均精度(mAP)达到了 49.8 和 52.5。相较于单个基线方法,性能的提升验证了所提出方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/e9d5325b3f54/sensors-22-08583-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/1afdd9633f26/sensors-22-08583-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/83a91822f53f/sensors-22-08583-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/2b1fdd934a44/sensors-22-08583-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/1778bf23c78a/sensors-22-08583-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/105331e2561e/sensors-22-08583-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/c2db6f9a28d0/sensors-22-08583-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/05181648d1b4/sensors-22-08583-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/233852d2c824/sensors-22-08583-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/e9d5325b3f54/sensors-22-08583-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/1afdd9633f26/sensors-22-08583-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/83a91822f53f/sensors-22-08583-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/2b1fdd934a44/sensors-22-08583-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/1778bf23c78a/sensors-22-08583-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/105331e2561e/sensors-22-08583-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/c2db6f9a28d0/sensors-22-08583-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/05181648d1b4/sensors-22-08583-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/233852d2c824/sensors-22-08583-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3581/9658927/e9d5325b3f54/sensors-22-08583-g009.jpg

相似文献

1
Attention-Guided Disentangled Feature Aggregation for Video Object Detection.面向视频目标检测的注意力引导解缠特征聚合。
Sensors (Basel). 2022 Nov 7;22(21):8583. doi: 10.3390/s22218583.
2
DGRNet: A Dual-Level Graph Relation Network for Video Object Detection.
IEEE Trans Image Process. 2023;32:4128-4141. doi: 10.1109/TIP.2023.3285136. Epub 2023 Jul 19.
3
Deep Spatial-Temporal Joint Feature Representation for Video Object Detection.用于视频目标检测的深度时空联合特征表示。
Sensors (Basel). 2018 Mar 4;18(3):774. doi: 10.3390/s18030774.
4
ssFPN: Scale Sequence () Feature-Based Feature Pyramid Network for Object Detection.ssFPN:基于尺度序列(Scale Sequence)特征的目标检测特征金字塔网络。
Sensors (Basel). 2023 Apr 30;23(9):4432. doi: 10.3390/s23094432.
5
Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕
IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.
6
Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.视频对象与人类动作检测中的视觉特征学习:系统综述
Micromachines (Basel). 2021 Dec 31;13(1):72. doi: 10.3390/mi13010072.
7
Object Detection in Videos by High Quality Object Linking.通过高质量对象链接实现视频中的目标检测
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1272-1278. doi: 10.1109/TPAMI.2019.2910529. Epub 2019 Apr 11.
8
MINet: Meta-Learning Instance Identifiers for Video Object Detection.MINet:用于视频目标检测的元学习实例标识符
IEEE Trans Image Process. 2021;30:6879-6891. doi: 10.1109/TIP.2021.3099409. Epub 2021 Aug 4.
9
Good view frames from ultrasonography (USG) video containing ONS diameter using state-of-the-art deep learning architectures.使用最先进的深度学习架构从包含视神经鞘直径的超声检查(USG)视频中获取良好的视野帧。
Med Biol Eng Comput. 2022 Dec;60(12):3397-3417. doi: 10.1007/s11517-022-02680-3. Epub 2022 Oct 3.
10
Individual honey bee tracking in a beehive environment using deep learning and Kalman filter.使用深度学习和卡尔曼滤波器在蜂巢环境中对单个蜜蜂进行跟踪。
Sci Rep. 2024 Jan 11;14(1):1061. doi: 10.1038/s41598-023-44718-y.

引用本文的文献

1
LPO-YOLOv5s: A Lightweight Pouring Robot Object Detection Algorithm.LPO-YOLOv5s:一种轻量级浇注机器人目标检测算法。
Sensors (Basel). 2023 Jul 14;23(14):6399. doi: 10.3390/s23146399.
2
Disentangled Dynamic Deviation Transformer Networks for Multivariate Time Series Anomaly Detection.解缠动态偏差变换网络在多元时间序列异常检测中的应用。
Sensors (Basel). 2023 Jan 18;23(3):1104. doi: 10.3390/s23031104.

本文引用的文献

1
Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments.利用实例分割的概念来提高挑战性环境下的检测能力。
Sensors (Basel). 2022 May 12;22(10):3703. doi: 10.3390/s22103703.
2
Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments.基于深度学习的挑战性环境目标检测的调查与性能分析。
Sensors (Basel). 2021 Jul 28;21(15):5116. doi: 10.3390/s21155116.
3
New Generation Deep Learning for Video Object Detection: A Survey.用于视频目标检测的新一代深度学习:综述
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3195-3215. doi: 10.1109/TNNLS.2021.3053249. Epub 2022 Aug 3.
4
Object detection based on an adaptive attention mechanism.基于自适应注意力机制的目标检测。
Sci Rep. 2020 Jul 9;10(1):11307. doi: 10.1038/s41598-020-67529-x.
5
Object Detection With Deep Learning: A Review.基于深度学习的目标检测研究综述。
IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3212-3232. doi: 10.1109/TNNLS.2018.2876865. Epub 2019 Jan 28.
6
Computer Vision in Healthcare Applications.医疗保健应用中的计算机视觉
J Healthc Eng. 2018 Mar 4;2018:5157020. doi: 10.1155/2018/5157020. eCollection 2018.