• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从网络数据中学习的视频中视觉事件识别。

Visual event recognition in videos by learning from Web data.

机构信息

Nanyang Technological University, N4-02a-29, Nanyang Avenue, Singapore 639798.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.

DOI:10.1109/TPAMI.2011.265
PMID:22201057
Abstract

We propose a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). Observing that consumer videos generally contain large intraclass variations within the same type of events, we first propose a new method, called Aligned Space-Time Pyramid Matching (ASTPM), to measure the distance between any two video clips. Second, we propose a new transfer learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). For each pyramid level and each type of local features, we first train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are then fused with equal weights to obtain a prelearned average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers from this event class or all the event classes by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. We also conduct an in-depth investigation on various aspects of the proposed method A-MKL, such as the analysis on the combination coefficients on the prelearned classifiers, the convergence of the learning algorithm, and the performance variation by using different proportions of labeled consumer videos. Moreover, we show that A-MKL using the prelearned classifiers from all the event classes leads to better performance when compared with A-MK- using the prelearned classifiers only from each individual event class.

摘要

我们提出了一种利用大量松散标记的网络视频(例如,来自 YouTube)进行消费类视频的视觉事件识别框架。观察到消费类视频通常在同一类型的事件中包含较大的类内变化,我们首先提出了一种新方法,称为对齐时空金字塔匹配(ASTPM),以测量任意两个视频剪辑之间的距离。其次,我们提出了一种新的迁移学习方法,称为自适应多核学习(A-MKL),以 1)融合来自多个金字塔层和特征(即时空特征和静态 SIFT 特征)的信息,2)处理来自两个域(即网络视频域和消费视频域)的视频之间特征分布的相当大的变化。对于每个金字塔层和每种类型的局部特征,我们首先使用来自不同核类型和参数的多个基本核在来自两个域的组合训练集上训练一组 SVM 分类器,然后以相等的权重融合以获得预学习的平均分类器。在 A-MKL 中,对于每个事件类,我们基于多个基本核和来自该事件类或所有事件类的预学习平均分类器学习一个自适应目标分类器,通过最小化结构风险函数和两个域的数据分布之间的失配来实现。大量实验证明了我们的框架的有效性,该框架仅需要利用网络数据的少量标记的消费视频。我们还对所提出的方法 A-MKL 的各个方面进行了深入研究,例如对预学习分类器的组合系数的分析、学习算法的收敛性以及使用不同比例的标记消费视频的性能变化。此外,我们表明,与仅使用每个单独事件类的预学习分类器的 A-MKL 相比,使用所有事件类的预学习分类器的 A-MKL 可以获得更好的性能。

相似文献

1
Visual event recognition in videos by learning from Web data.从网络数据中学习的视频中视觉事件识别。
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1667-80. doi: 10.1109/TPAMI.2011.265.
2
Domain transfer multiple kernel learning.域迁移多核学习。
IEEE Trans Pattern Anal Mach Intell. 2012 Mar;34(3):465-79. doi: 10.1109/TPAMI.2011.114.
3
Surgical gesture classification from video and kinematic data.基于视频和运动学数据的外科手势分类。
Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.
4
Video event recognition using kernel methods with multilevel temporal alignment.使用具有多级时间对齐的核方法进行视频事件识别。
IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1985-97. doi: 10.1109/TPAMI.2008.129.
5
Animated pose templates for modeling and detecting human actions.用于建模和检测人体动作的动画姿势模板。
IEEE Trans Pattern Anal Mach Intell. 2014 Mar;36(3):436-52. doi: 10.1109/TPAMI.2013.144.
6
Tiny videos: a large data set for nonparametric video retrieval and frame classification.微小视频:用于非参数视频检索和帧分类的大数据集。
IEEE Trans Pattern Anal Mach Intell. 2011 Mar;33(3):618-30. doi: 10.1109/TPAMI.2010.118.
7
Image classification with densely sampled image windows and generalized adaptive multiple kernel learning.基于密集采样图像窗口和广义自适应多核学习的图像分类。
IEEE Trans Cybern. 2015 Mar;45(3):395-404. doi: 10.1109/TCYB.2014.2326596. Epub 2014 Jun 24.
8
Cross-domain human action recognition.跨域人类动作识别
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):298-307. doi: 10.1109/TSMCB.2011.2166761. Epub 2011 Sep 26.
9
Group-sensitive multiple kernel learning for object recognition.面向目标识别的群组敏感多核学习。
IEEE Trans Image Process. 2012 May;21(5):2838-52. doi: 10.1109/TIP.2012.2183139. Epub 2012 Jan 9.
10
Automatic detection of informative frames from wireless capsule endoscopy images.无线胶囊内窥镜图像中信息帧的自动检测。
Med Image Anal. 2010 Jun;14(3):449-70. doi: 10.1016/j.media.2009.12.001. Epub 2010 Jan 4.

引用本文的文献

1
Balanced Distribution Adaptation for Metal Oxide Semiconductor Gas Sensor Array Drift Compensation.用于金属氧化物半导体气体传感器阵列漂移补偿的平衡分布自适应。
Sensors (Basel). 2021 May 13;21(10):3403. doi: 10.3390/s21103403.
2
Wasserstein Distance Learns Domain Invariant Feature Representations for Drift Compensation of E-Nose.瓦瑟斯坦距离学习用于电子鼻漂移补偿的域不变特征表示。
Sensors (Basel). 2019 Aug 26;19(17):3703. doi: 10.3390/s19173703.
3
Identifying Autism Spectrum Disorder With Multi-Site fMRI via Low-Rank Domain Adaptation.
基于低秩域自适应的多中心 fMRI 识别自闭症谱系障碍
IEEE Trans Med Imaging. 2020 Mar;39(3):644-655. doi: 10.1109/TMI.2019.2933160. Epub 2019 Aug 5.
4
Collegial Activity Learning between Heterogeneous Sensors.异构传感器之间的合作活动学习
Knowl Inf Syst. 2017 Nov;53(2):337-364. doi: 10.1007/s10115-017-1043-3. Epub 2017 Mar 27.
5
Semantic Pooling for Complex Event Analysis in Untrimmed Videos.非修剪视频中复杂事件分析的语义池化。
IEEE Trans Pattern Anal Mach Intell. 2017 Aug;39(8):1617-1632. doi: 10.1109/TPAMI.2016.2608901. Epub 2016 Sep 13.
6
Kernel Manifold Alignment for Domain Adaptation.用于域适应的核流形对齐
PLoS One. 2016 Feb 12;11(2):e0148655. doi: 10.1371/journal.pone.0148655. eCollection 2016.
7
An adaptive Hidden Markov model for activity recognition based on a wearable multi-sensor device.一种基于可穿戴多传感器设备的用于活动识别的自适应隐马尔可夫模型。
J Med Syst. 2015 May;39(5):57. doi: 10.1007/s10916-015-0239-x. Epub 2015 Mar 19.
8
A Survey on Ambient Intelligence in Health Care.医疗保健中的环境智能调查
Proc IEEE Inst Electr Electron Eng. 2013 Dec 1;101(12):2470-2494. doi: 10.1109/JPROC.2013.2262913.
9
Transfer Learning for Activity Recognition: A Survey.用于活动识别的迁移学习:一项综述。
Knowl Inf Syst. 2013 Sep 1;36(3):537-556. doi: 10.1007/s10115-013-0665-3.
10
Domain transfer learning for MCI conversion prediction.用于轻度认知障碍转化预测的领域迁移学习
Med Image Comput Comput Assist Interv. 2012;15(Pt 1):82-90. doi: 10.1007/978-3-642-33415-3_11.