• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过VSPW数据集实现时间像素级语义理解

Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.

作者信息

Miao Jiaxu, Wei Yunchao, Wang Xiaohan, Yang Yi

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):11297-11308. doi: 10.1109/TPAMI.2023.3266023. Epub 2023 Aug 7.

DOI:10.1109/TPAMI.2023.3266023
PMID:37037230
Abstract

Scene understanding through pixel-level semantic parsing is one of the main problems in computer vision. Till now, image-based methods and datasets for scene parsing have been well explored. However, the real world is naturally dynamic instead of a static state. Thus, learning to perform video scene parsing is more practical for real-world applications. Considering that few datasets cover an extensive range of scenes and object categories with temporal pixel-level annotations, in this work, we present a large-scale video scene parsing dataset, namely VSPW (Video Scene Parsing in the Wild). To be specific, there are a total of 251,633 frames from 3,536 videos with densely pixel-wise annotations in VSPW, including a large variety of 231 scenes and 124 object categories. Besides, VSPW is densely annotated with a high frame rate of 15 f/s, and over 96% of videos from VSPW have high spatial resolutions from 720P to 4 K. To the best of our knowledge, VSPW is the first attempt to address the challenging video scene parsing task in the wild by considering diverse scenes. Based on our VSPW, we further propose Temporal Attention Blending (TAB) Networks to harness temporal context information for better pixel-level semantic understanding of videos. Extensive experiments on VSPW well demonstrate the superiority of the proposed TAB over other baseline approaches. We hope the new proposed dataset and the explorations in this work can help advance the challenging yet practical video scene parsing task in the future. Both the dataset and the code are available at www.vspwdataset.com.

摘要

通过像素级语义解析进行场景理解是计算机视觉中的主要问题之一。到目前为止,基于图像的场景解析方法和数据集已经得到了充分的探索。然而,现实世界本质上是动态的,而非静态的。因此,学习执行视频场景解析对于实际应用更为实用。考虑到很少有数据集涵盖具有时间像素级注释的广泛场景和对象类别,在这项工作中,我们提出了一个大规模的视频场景解析数据集,即VSPW(野外视频场景解析)。具体来说,VSPW中共有来自3536个视频的251633帧,带有密集的逐像素注释,包括231种各种各样的场景和124个对象类别。此外,VSPW以15帧/秒的高帧率进行密集注释,并且VSPW中超过96%的视频具有从720P到4K的高空间分辨率。据我们所知,VSPW是首次尝试通过考虑不同场景来解决野外具有挑战性的视频场景解析任务。基于我们的VSPW,我们进一步提出了时间注意力融合(TAB)网络,以利用时间上下文信息来更好地对视频进行像素级语义理解。在VSPW上进行的大量实验充分证明了所提出的TAB相对于其他基线方法的优越性。我们希望新提出的数据集以及这项工作中的探索能够在未来推动具有挑战性但实用的视频场景解析任务的发展。数据集和代码均可在www.vspwdataset.com上获取。

相似文献

1
Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset.通过VSPW数据集实现时间像素级语义理解
IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):11297-11308. doi: 10.1109/TPAMI.2023.3266023. Epub 2023 Aug 7.
2
Robust Scene Parsing by Mining Supportive Knowledge From Dataset.通过从数据集中挖掘支持性知识进行稳健的场景解析
IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2633-2646. doi: 10.1109/TNNLS.2021.3107194. Epub 2023 May 2.
3
Boosting Night-Time Scene Parsing With Learnable Frequency.利用可学习的频率提升夜间场景解析。
IEEE Trans Image Process. 2023;32:2386-2398. doi: 10.1109/TIP.2023.3267044. Epub 2023 Apr 25.
4
STC-GAN: Spatio-Temporally Coupled Generative Adversarial Networks for Predictive Scene Parsing.STC-GAN:用于预测场景解析的时空耦合生成对抗网络
IEEE Trans Image Process. 2020 Apr 1. doi: 10.1109/TIP.2020.2983567.
5
Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions.通过带有图像描述的弱监督学习进行分层场景解析
IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):596-610. doi: 10.1109/TPAMI.2018.2799846. Epub 2018 Jan 30.
6
Satellite Video Multi-Label Scene Classification With Spatial and Temporal Feature Cooperative Encoding: A Benchmark Dataset and Method.基于时空特征协同编码的卫星视频多标签场景分类:一个基准数据集与方法
IEEE Trans Image Process. 2024;33:2238-2251. doi: 10.1109/TIP.2024.3374100. Epub 2024 Mar 21.
7
Night-Time Scene Parsing With a Large Real Dataset.基于大型真实数据集的夜间场景解析
IEEE Trans Image Process. 2021;30:9085-9098. doi: 10.1109/TIP.2021.3122004. Epub 2021 Nov 3.
8
PIG: Prompt Images Guidance for Night-Time Scene Parsing.
IEEE Trans Image Process. 2024;33:3921-3934. doi: 10.1109/TIP.2024.3415963. Epub 2024 Jun 28.
9
Video Salient Object Detection via Fully Convolutional Networks.基于全卷积网络的视频显著目标检测
IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.
10
DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and Semantics.DPSNet:利用几何推理进行场景深度和语义的多任务学习。
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2710-2721. doi: 10.1109/TNNLS.2021.3107362. Epub 2023 Jun 1.