基于支持向量分类器的无监督动作建议在在线视频处理中的应用。

Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.

机构信息

GRAM, Department of Signal Theory and Communications, University of Alcalá, 28805 Alcalá de Henares, Spain.

出版信息

Sensors (Basel). 2020 May 22;20(10):2953. doi: 10.3390/s20102953.

DOI:10.3390/s20102953

PMID:32456050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7285365/

Abstract

In this work, we introduce an intelligent video sensor for the problem of Action Proposals (AP). AP consists of localizing temporal segments in untrimmed videos that are likely to contain actions. Solving this problem can accelerate several video action understanding tasks, such as detection, retrieval, or indexing. All previous AP approaches are supervised and offline, i.e. they need both the temporal annotations of the datasets during training and access to the whole video to effectively cast the proposals. We propose here a new approach which, unlike the rest of the state-of-the-art models, is unsupervised. This implies that we do not allow it to see any labeled data during learning nor to work with any pre-trained feature on the used dataset. Moreover, our approach also operates in an online manner, which can be beneficial for many real-world applications where the video has to be processed as soon as it arrives at the sensor, e.g., robotics or video monitoring. The core of our method is based on a Support Vector Classifier (SVC) module which produces candidate segments for AP by distinguishing between sets of contiguous video frames. We further propose a mechanism to refine and filter those candidate segments. This filter optimizes a learning-to-rank formulation over the dynamics of the segments. An extensive experimental evaluation is conducted on Thumos'14 and ActivityNet datasets, and, to the best of our knowledge, this work supposes the first unsupervised approach on these main AP benchmarks. Finally, we also provide a thorough comparison to the current state-of-the-art supervised AP approaches. We achieve 41% and 59% of the performance of the best-supervised model on ActivityNet and Thumos'14, respectively, confirming our unsupervised solution as a correct option to tackle the AP problem. The code to reproduce all our results will be publicly released upon acceptance of the paper.

摘要

在这项工作中，我们引入了一种智能视频传感器，用于解决动作建议（AP）问题。AP 包括在未修剪的视频中定位可能包含动作的时间片段。解决这个问题可以加速几个视频动作理解任务，如检测、检索或索引。以前所有的 AP 方法都是有监督和离线的，也就是说，它们在训练过程中既需要数据集的时间标注，也需要访问整个视频，才能有效地提出建议。我们在这里提出了一种新的方法，与其他最先进的模型不同，它是无监督的。这意味着在学习过程中我们不允许它看到任何有标签的数据，也不允许在使用的数据集上使用任何预先训练的特征。此外，我们的方法还可以在线运行，这对于许多实时应用程序非常有益，例如机器人技术或视频监控，这些应用程序需要在视频到达传感器时立即对其进行处理。我们的方法的核心是基于支持向量分类器（SVC）模块，该模块通过区分连续的视频帧集来生成 AP 的候选片段。我们进一步提出了一种机制来优化和过滤这些候选片段。该过滤器通过对片段的动态进行学习到排序的公式进行优化。我们在 Thumos'14 和 ActivityNet 数据集上进行了广泛的实验评估，据我们所知，这是在这些主要的 AP 基准上首次提出的无监督方法。最后，我们还与当前最先进的有监督 AP 方法进行了全面比较。我们在 ActivityNet 和 Thumos'14 上分别实现了性能最好的有监督模型的 41%和 59%，这证实了我们的无监督解决方案是解决 AP 问题的正确选择。在论文接受后，我们将公开发布重现所有结果的代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e05/7285365/6b9a1b3285a0/sensors-20-02953-g001.jpg

相似文献

Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.

Sensors (Basel). 2020 May 22;20(10):2953. doi: 10.3390/s20102953.

Deep Learning-Based Action Detection in Untrimmed Videos: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4302-4320. doi: 10.1109/TPAMI.2022.3193611. Epub 2023 Mar 7.

Two-Stream Region Convolutional 3D Network for Temporal Activity Detection.

IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2319-2332. doi: 10.1109/TPAMI.2019.2921539. Epub 2019 Jun 7.

Unsupervised Online Video Object Segmentation With Motion Property Understanding.

IEEE Trans Image Process. 2020;29:237-249. doi: 10.1109/TIP.2019.2930152. Epub 2019 Jul 26.

Deep Motion Prior for Weakly-Supervised Temporal Action Localization.

IEEE Trans Image Process. 2022;31:5203-5213. doi: 10.1109/TIP.2022.3193752. Epub 2022 Aug 4.

TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection.

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3848-3861. doi: 10.1109/TPAMI.2022.3183586. Epub 2023 Feb 3.

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization.

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4560-4574. doi: 10.1109/TNNLS.2024.3377468. Epub 2025 Feb 28.

Semi-Supervised Image-to-Video Adaptation for Video Action Recognition.

IEEE Trans Cybern. 2017 Apr;47(4):960-973. doi: 10.1109/TCYB.2016.2535122. Epub 2016 Mar 14.

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.

Sensors (Basel). 2019 Mar 3;19(5):1085. doi: 10.3390/s19051085.

Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.

Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.

本文引用的文献

Automatic Detection of the Pharyngeal Phase in Raw Videos for the Videofluoroscopic Swallowing Study Using Efficient Data Collection and 3D Convolutional Networks .

Sensors (Basel). 2019 Sep 7;19(18):3873. doi: 10.3390/s19183873.

A Comprehensive Survey of Vision-Based Human Action Recognition Methods.

Sensors (Basel). 2019 Feb 27;19(5):1005. doi: 10.3390/s19051005.

Moments in Time Dataset: One Million Videos for Event Understanding.

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):502-508. doi: 10.1109/TPAMI.2019.2901464. Epub 2019 Feb 25.

Rank Pooling for Action Recognition.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):773-787. doi: 10.1109/TPAMI.2016.2558148.

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling.

IEEE Trans Image Process. 2015 Nov;24(11):3781-95. doi: 10.1109/TIP.2015.2456412. Epub 2015 Jul 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于支持向量分类器的无监督动作建议在在线视频处理中的应用。

Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献