Suppr超能文献

非修剪视频中复杂事件分析的语义池化。

Semantic Pooling for Complex Event Analysis in Untrimmed Videos.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Aug;39(8):1617-1632. doi: 10.1109/TPAMI.2016.2608901. Epub 2016 Sep 13.

Abstract

Pooling plays an important role in generating a discriminative video representation. In this paper, we propose a new semantic pooling approach for challenging event analysis tasks (e.g., event detection, recognition, and recounting) in long untrimmed Internet videos, especially when only a few shots/segments are relevant to the event of interest while many other shots are irrelevant or even misleading. The commonly adopted pooling strategies aggregate the shots indifferently in one way or another, resulting in a great loss of information. Instead, in this work we first define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest. We then prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event analysis. Next, we propose a new isotonic regularizer that is able to exploit the constructed semantic ordering information. The resulting nearly-isotonic support vector machine classifier exhibits higher discriminative power in event analysis tasks. Computationally, we develop an efficient implementation using the proximal gradient algorithm, and we prove new and closed-form proximal steps. We conduct extensive experiments on three real-world video datasets and achieve promising improvements.

摘要

池化在生成判别性视频表示方面起着重要作用。在本文中,我们提出了一种新的语义池化方法,用于处理具有挑战性的事件分析任务(例如,事件检测、识别和重述),特别是当只有少数几个镜头/片段与感兴趣的事件相关,而许多其他镜头是不相关的甚至是误导性的。通常采用的池化策略以一种或另一种方式不加区分地聚合镜头,导致信息大量丢失。相反,在这项工作中,我们首先定义了一种新的语义显着性概念,用于评估每个镜头与感兴趣事件的相关性。然后,我们根据它们的显着性得分对镜头进行优先级排序,因为语义上更显着的镜头有望对最终的事件分析做出更大的贡献。接下来,我们提出了一种新的保序正则化器,能够利用构建的语义排序信息。由此产生的近保序支持向量机分类器在事件分析任务中表现出更高的判别能力。在计算方面,我们使用近端梯度算法开发了一种高效的实现,并证明了新的闭式近端步骤。我们在三个真实视频数据集上进行了广泛的实验,取得了有希望的改进。

相似文献

1
Semantic Pooling for Complex Event Analysis in Untrimmed Videos.非修剪视频中复杂事件分析的语义池化。
IEEE Trans Pattern Anal Mach Intell. 2017 Aug;39(8):1617-1632. doi: 10.1109/TPAMI.2016.2608901. Epub 2016 Sep 13.
2
Keyframe extraction from laparoscopic videos based on visual saliency detection.基于视觉显著性检测的腹腔镜视频关键帧提取。
Comput Methods Programs Biomed. 2018 Oct;165:13-23. doi: 10.1016/j.cmpb.2018.07.004. Epub 2018 Jul 18.
3
Visual event recognition in videos by learning from Web data.从网络数据中学习的视频中视觉事件识别。
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.
4
Submodular Attribute Selection for Visual Recognition.用于视觉识别的次模属性选择。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2242-2255. doi: 10.1109/TPAMI.2016.2636827. Epub 2016 Dec 7.
5
Close Human Interaction Recognition Using Patch-Aware Models.基于补丁感知模型的近距人类交互识别
IEEE Trans Image Process. 2016 Jan;25(1):167-78. doi: 10.1109/TIP.2015.2498410. Epub 2015 Nov 5.
8
Classification approach for automatic laparoscopic video database organization.用于自动腹腔镜视频数据库组织的分类方法。
Int J Comput Assist Radiol Surg. 2015 Sep;10(9):1449-60. doi: 10.1007/s11548-015-1183-4. Epub 2015 Apr 7.
9
Deep Attention Network for Egocentric Action Recognition.基于深度注意力网络的自我中心动作识别。
IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.
10
Explicit modeling of human-object interactions in realistic videos.真实视频中人类-物体交互的显式建模。
IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):835-48. doi: 10.1109/TPAMI.2012.175.

引用本文的文献

3
Open-Environment Robotic Acoustic Perception for Object Recognition.用于目标识别的开放环境机器人声学感知
Front Neurorobot. 2019 Nov 22;13:96. doi: 10.3389/fnbot.2019.00096. eCollection 2019.

本文引用的文献

1
Video2vec Embeddings Recognize Events When Examples Are Scarce.Video2vec 嵌入识别在例子稀缺时的事件。
IEEE Trans Pattern Anal Mach Intell. 2017 Oct;39(10):2089-2103. doi: 10.1109/TPAMI.2016.2627563. Epub 2016 Nov 10.
2
Order Preserving Sparse Coding.有序保持稀疏编码。
IEEE Trans Pattern Anal Mach Intell. 2015 Aug;37(8):1615-28. doi: 10.1109/TPAMI.2014.2362935.
5
Visual event recognition in videos by learning from Web data.从网络数据中学习的视频中视觉事件识别。
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验