• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

张量表示在动作识别中的应用。

Tensor Representations for Action Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):648-665. doi: 10.1109/TPAMI.2021.3107160. Epub 2022 Jan 7.

DOI:10.1109/TPAMI.2021.3107160
PMID:34428136
Abstract

Human actions in video sequences are characterized by the complex interplay between spatial features and their temporal dynamics. In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition. We propose two tensor-based feature representations, viz. (i) sequence compatibility kernel (SCK) and (ii) dynamics compatibility kernel (DCK). SCK builds on the spatio-temporal correlations between features, whereas DCK explicitly models the action dynamics of a sequence. We also explore generalization of SCK, coined SCK ⊕, that operates on subsequences to capture the local-global interplay of correlations, which can incorporate multi-modal inputs e.g., skeleton 3D body-joints and per-frame classifier scores obtained from deep learning models trained on videos. We introduce linearization of these kernels that lead to compact and fast descriptors. We provide experiments on (i) 3D skeleton action sequences, (ii) fine-grained video sequences, and (iii) standard non-fine-grained videos. As our final representations are tensors that capture higher-order relationships of features, they relate to co-occurrences for robust fine-grained recognition (Lin, 2017), (Koniusz, 2018). We use higher-order tensors and so-called Eigenvalue Power Normalization (EPN) which have been long speculated to perform spectral detection of higher-order occurrences (Koniusz, 2013), (Koniusz, 2017), thus detecting fine-grained relationships of features rather than merely count features in action sequences. We prove that a tensor of order r, built from Z dimensional features, coupled with EPN indeed detects if at least one higher-order occurrence is 'projected' into one of its [Formula: see text] subspaces of dim. r represented by the tensor, thus forming a Tensor Power Normalization metric endowed with [Formula: see text] such 'detectors'.

摘要

人体动作在视频序列中具有空间特征与其时间动态之间复杂的相互作用。在本文中,我们提出了新的张量表示法,用于紧凑地捕获视觉特征之间的这种更高阶关系,用于动作识别任务。我们提出了两种基于张量的特征表示,即(i)序列兼容性核(SCK)和(ii)动态兼容性核(DCK)。SCK 基于特征之间的时空相关性,而 DCK 则明确地对序列的动作动态建模。我们还探索了 SCK 的泛化,即 SCK⊕,它作用于子序列,以捕获相关性的局部-全局相互作用,它可以结合多模态输入,例如 3D 骨架人体关节和从视频训练的深度学习模型获得的每一帧分类器得分。我们介绍了这些核的线性化,这导致了紧凑而快速的描述符。我们在(i)3D 骨架动作序列、(ii)细粒度视频序列和(iii)标准非细粒度视频上进行了实验。由于我们的最终表示形式是张量,它可以捕获特征的高阶关系,因此与稳健的细粒度识别的共现相关(Lin,2017),(Koniusz,2018)。我们使用高阶张量和所谓的特征值幂归一化(EPN),长期以来一直推测它们执行高阶出现的谱检测(Koniusz,2013),(Koniusz,2017),从而检测特征的细粒度关系,而不仅仅是在动作序列中计数特征。我们证明,从 Z 维特征构建的阶 r 张量,与 EPN 结合使用,确实可以检测到至少一个高阶出现是否被“投影”到其[公式:见文本]个维度 r 的子空间之一,由张量表示,从而形成具有[公式:见文本]个这样的“检测器”的张量幂归一化度量。

相似文献

1
Tensor Representations for Action Recognition.张量表示在动作识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):648-665. doi: 10.1109/TPAMI.2021.3107160. Epub 2022 Jan 7.
2
Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection.超越关节:基于骨架的动作识别和检测的从原始几何形状中学习表示。
IEEE Trans Image Process. 2018 Sep;27(9):4382-4394. doi: 10.1109/TIP.2018.2837386.
3
Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning.基于图的多粒度交互学习的细粒度视频字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):666-683. doi: 10.1109/TPAMI.2019.2946823. Epub 2022 Jan 7.
4
Learning Clip Representations for Skeleton-Based 3D Action Recognition.学习基于骨架的 3D 动作识别的剪辑表示。
IEEE Trans Image Process. 2018 Jun;27(6):2842-2855. doi: 10.1109/TIP.2018.2812099.
5
Linguistic-Driven Partial Semantic Relevance Learning for Skeleton-Based Action Recognition.基于骨架的动作识别的语言驱动部分语义相关性学习。
Sensors (Basel). 2024 Jul 26;24(15):4860. doi: 10.3390/s24154860.
6
Motion-Driven Visual Tempo Learning for Video-Based Action Recognition.基于运动驱动的视觉节奏学习的视频动作识别。
IEEE Trans Image Process. 2022;31:4104-4116. doi: 10.1109/TIP.2022.3180585. Epub 2022 Jun 20.
7
Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition.基于方向金字塔共现的时空上下文建模方法及其在动作识别中的应用
IEEE Trans Image Process. 2014 Feb;23(2):658-72. doi: 10.1109/TIP.2013.2291319.
8
Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks.基于骨架的全局上下文感知注意力 LSTM 网络的人体动作识别。
IEEE Trans Image Process. 2018 Apr;27(4):1586-1599. doi: 10.1109/TIP.2017.2785279.
9
Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition.基于骨架的动作识别的时态动力学表示学习。
IEEE Trans Image Process. 2016 Jul;25(7):3010-3022. doi: 10.1109/TIP.2016.2552404. Epub 2016 Apr 8.
10
Affective Action and Interaction Recognition by Multi-View Representation Learning from Handcrafted Low-Level Skeleton Features.基于手工制作的低层骨架特征的多视图表示学习的情感动作和交互识别。
Int J Neural Syst. 2022 Oct;32(10):2250040. doi: 10.1142/S012906572250040X. Epub 2022 Jul 25.

引用本文的文献

1
Tensor-powered insights into neural dynamics.张量驱动的神经动力学见解。
Commun Biol. 2025 Feb 24;8(1):298. doi: 10.1038/s42003-025-07711-x.
2
ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture Recognition.ST-TGR:基于骨骼的教学手势识别的时空表征学习
Sensors (Basel). 2024 Apr 18;24(8):2589. doi: 10.3390/s24082589.
3
Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study.基于 3D 人体骨骼的人类活动识别深度学习:综述与比较研究。
Sensors (Basel). 2023 May 27;23(11):5121. doi: 10.3390/s23115121.