从网络数据中学习的视频中视觉事件识别。

Visual event recognition in videos by learning from Web data.

机构信息

Nanyang Technological University, N4-02a-29, Nanyang Avenue, Singapore 639798.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.

DOI:10.1109/TPAMI.2011.265

PMID:22201057

Abstract

We propose a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). Observing that consumer videos generally contain large intraclass variations within the same type of events, we first propose a new method, called Aligned Space-Time Pyramid Matching (ASTPM), to measure the distance between any two video clips. Second, we propose a new transfer learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). For each pyramid level and each type of local features, we first train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are then fused with equal weights to obtain a prelearned average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers from this event class or all the event classes by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. We also conduct an in-depth investigation on various aspects of the proposed method A-MKL, such as the analysis on the combination coefficients on the prelearned classifiers, the convergence of the learning algorithm, and the performance variation by using different proportions of labeled consumer videos. Moreover, we show that A-MKL using the prelearned classifiers from all the event classes leads to better performance when compared with A-MK- using the prelearned classifiers only from each individual event class.

摘要

我们提出了一种利用大量松散标记的网络视频（例如，来自 YouTube）进行消费类视频的视觉事件识别框架。观察到消费类视频通常在同一类型的事件中包含较大的类内变化，我们首先提出了一种新方法，称为对齐时空金字塔匹配（ASTPM），以测量任意两个视频剪辑之间的距离。其次，我们提出了一种新的迁移学习方法，称为自适应多核学习（A-MKL），以 1）融合来自多个金字塔层和特征（即时空特征和静态 SIFT 特征）的信息，2）处理来自两个域（即网络视频域和消费视频域）的视频之间特征分布的相当大的变化。对于每个金字塔层和每种类型的局部特征，我们首先使用来自不同核类型和参数的多个基本核在来自两个域的组合训练集上训练一组 SVM 分类器，然后以相等的权重融合以获得预学习的平均分类器。在 A-MKL 中，对于每个事件类，我们基于多个基本核和来自该事件类或所有事件类的预学习平均分类器学习一个自适应目标分类器，通过最小化结构风险函数和两个域的数据分布之间的失配来实现。大量实验证明了我们的框架的有效性，该框架仅需要利用网络数据的少量标记的消费视频。我们还对所提出的方法 A-MKL 的各个方面进行了深入研究，例如对预学习分类器的组合系数的分析、学习算法的收敛性以及使用不同比例的标记消费视频的性能变化。此外，我们表明，与仅使用每个单独事件类的预学习分类器的 A-MKL 相比，使用所有事件类的预学习分类器的 A-MKL 可以获得更好的性能。

相似文献

Visual event recognition in videos by learning from Web data.从网络数据中学习的视频中视觉事件识别。

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.

Domain transfer multiple kernel learning.域迁移多核学习。

IEEE Trans Pattern Anal Mach Intell. 2012 Mar;34(3):465-79. doi: 10.1109/TPAMI.2011.114.

Surgical gesture classification from video and kinematic data.基于视频和运动学数据的外科手势分类。

Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.

Video event recognition using kernel methods with multilevel temporal alignment.使用具有多级时间对齐的核方法进行视频事件识别。

IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1985-97. doi: 10.1109/TPAMI.2008.129.

Animated pose templates for modeling and detecting human actions.用于建模和检测人体动作的动画姿势模板。

IEEE Trans Pattern Anal Mach Intell. 2014 Mar;36(3):436-52. doi: 10.1109/TPAMI.2013.144.

Tiny videos: a large data set for nonparametric video retrieval and frame classification.微小视频：用于非参数视频检索和帧分类的大数据集。

IEEE Trans Pattern Anal Mach Intell. 2011 Mar;33(3):618-30. doi: 10.1109/TPAMI.2010.118.

Image classification with densely sampled image windows and generalized adaptive multiple kernel learning.基于密集采样图像窗口和广义自适应多核学习的图像分类。

IEEE Trans Cybern. 2015 Mar;45(3):395-404. doi: 10.1109/TCYB.2014.2326596. Epub 2014 Jun 24.

Cross-domain human action recognition.跨域人类动作识别

IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):298-307. doi: 10.1109/TSMCB.2011.2166761. Epub 2011 Sep 26.

Group-sensitive multiple kernel learning for object recognition.面向目标识别的群组敏感多核学习。

IEEE Trans Image Process. 2012 May;21(5):2838-52. doi: 10.1109/TIP.2012.2183139. Epub 2012 Jan 9.

Automatic detection of informative frames from wireless capsule endoscopy images.无线胶囊内窥镜图像中信息帧的自动检测。

Med Image Anal. 2010 Jun;14(3):449-70. doi: 10.1016/j.media.2009.12.001. Epub 2010 Jan 4.

引用本文的文献

Balanced Distribution Adaptation for Metal Oxide Semiconductor Gas Sensor Array Drift Compensation.用于金属氧化物半导体气体传感器阵列漂移补偿的平衡分布自适应。

Sensors (Basel). 2021 May 13;21(10):3403. doi: 10.3390/s21103403.

Wasserstein Distance Learns Domain Invariant Feature Representations for Drift Compensation of E-Nose.瓦瑟斯坦距离学习用于电子鼻漂移补偿的域不变特征表示。

Sensors (Basel). 2019 Aug 26;19(17):3703. doi: 10.3390/s19173703.

Identifying Autism Spectrum Disorder With Multi-Site fMRI via Low-Rank Domain Adaptation.

从网络数据中学习的视频中视觉事件识别。

Visual event recognition in videos by learning from Web data.

机构信息

Nanyang Technological University, N4-02a-29, Nanyang Avenue, Singapore 639798.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.

DOI:10.1109/TPAMI.2011.265

PMID:22201057

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

从网络数据中学习的视频中视觉事件识别。

Visual event recognition in videos by learning from Web data.

机构信息

出版信息

相似文献

引用本文的文献

从网络数据中学习的视频中视觉事件识别。

Visual event recognition in videos by learning from Web data.

机构信息

出版信息

相似文献

引用本文的文献