• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VPN++:重新思考视频姿态嵌入以理解日常生活活动。

VPN++: Rethinking Video-Pose Embeddings for Understanding Activities of Daily Living.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9703-9717. doi: 10.1109/TPAMI.2021.3127885. Epub 2022 Nov 7.

DOI:10.1109/TPAMI.2021.3127885
PMID:34767506
Abstract

Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN++. It is worth noting that VPN++ exploits the pose embeddings at training via distillation but not at inference. We show that VPN++ is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN++, with or without 3D Poses, outperforms the representative baselines on 4 public datasets. Code is available at https://github.com/srijandas07/vpnplusplus.

摘要

许多人尝试将 RGB 和 3D 姿势结合起来,以识别日常生活活动(ADL)。ADL 可能看起来非常相似,通常需要建模精细的细节来区分它们。由于最近的 3D ConvNets 过于僵化,无法捕捉动作中的微妙视觉模式,因此该研究方向主要由结合 RGB 和 3D 姿势的方法主导。但是,在没有适当传感器的情况下,从 RGB 流计算 3D 姿势的成本很高。这限制了上述方法在需要低延迟的实际应用中的使用。那么,如何最好地利用 3D 姿势来识别 ADL 呢?为此,我们提出了一种姿态驱动注意力机制的扩展:视频姿态网络(VPN),探索了两个不同的方向。一种是通过特征级蒸馏将姿势知识转移到 RGB 中,另一种是通过注意力级蒸馏模仿姿势驱动的注意力。最后,这两种方法集成到一个单一的模型中,我们称之为 VPN++。值得注意的是,VPN++在训练时通过蒸馏利用姿态嵌入,但在推理时不利用。我们表明,VPN++不仅有效,而且对噪声姿态具有较高的加速和弹性。无论是否有 3D 姿势,VPN++在 4 个公共数据集上的表现都优于代表性基线。代码可在 https://github.com/srijandas07/vpnplusplus 获得。

相似文献

1
VPN++: Rethinking Video-Pose Embeddings for Understanding Activities of Daily Living.VPN++:重新思考视频姿态嵌入以理解日常生活活动。
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9703-9717. doi: 10.1109/TPAMI.2021.3127885. Epub 2022 Nov 7.
2
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera.基于单目 RGB 相机的联合 3D 姿态估计和动作识别的统一深度框架。
Sensors (Basel). 2020 Mar 25;20(7):1825. doi: 10.3390/s20071825.
3
Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video.基于双网络的单目视频3D多人姿态估计
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1636-1651. doi: 10.1109/TPAMI.2022.3170353. Epub 2023 Jan 6.
4
2DHeadPose: A simple and effective annotation method for the head pose in RGB images and its dataset.2DHeadPose:一种简单有效的 RGB 图像中头部姿势标注方法及其数据集。
Neural Netw. 2023 Mar;160:50-62. doi: 10.1016/j.neunet.2022.12.021. Epub 2023 Jan 2.
5
3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images.基于合成数据和弱标注 RGB 图像的三维手姿估计。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):3739-3753. doi: 10.1109/TPAMI.2020.2993627. Epub 2021 Oct 1.
6
Pose-Appearance Relational Modeling for Video Action Recognition.用于视频动作识别的姿势-外观关系建模
IEEE Trans Image Process. 2023;32:295-308. doi: 10.1109/TIP.2022.3228156. Epub 2022 Dec 21.
7
An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks.基于二维图像特征点的高效三维人体姿态检索与重建。
Sensors (Basel). 2021 Apr 1;21(7):2415. doi: 10.3390/s21072415.
8
Visual Camera Re-Localization From RGB and RGB-D Images Using DSAC.基于 DSAC 的 RGB 和 RGB-D 图像的视觉相机重定位
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5847-5865. doi: 10.1109/TPAMI.2021.3070754. Epub 2022 Aug 4.
9
Multi-Task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition.多任务深度学习的实时三维人体姿态估计和动作识别。
IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2752-2764. doi: 10.1109/TPAMI.2020.2976014. Epub 2021 Jul 1.
10
Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos.学习增强图像和视频中的 3D 人体姿态估计的姿态。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10012-10026. doi: 10.1109/TPAMI.2023.3243400. Epub 2023 Jun 30.

引用本文的文献

1
Machine Learning for Human Activity Recognition: State-of-the-Art Techniques and Emerging Trends.用于人类活动识别的机器学习:最新技术与新兴趋势。
J Imaging. 2025 Mar 20;11(3):91. doi: 10.3390/jimaging11030091.
2
SignFormer-GCN: Continuous sign language translation using spatio-temporal graph convolutional networks.SignFormer-GCN:使用时空图卷积网络的连续手语翻译
PLoS One. 2025 Feb 14;20(2):e0316298. doi: 10.1371/journal.pone.0316298. eCollection 2025.
3
Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7.
基于卷积神经网络的人类动作识别中的多级特征融合:以EfficientNet-B7为例
J Imaging. 2024 Dec 12;10(12):320. doi: 10.3390/jimaging10120320.
4
Exploring the potential of the sit-to-stand test for self-assessment of physical condition in advanced knee osteoarthritis patients using computer vision.利用计算机视觉探索坐站测试在评估晚期膝骨关节炎患者身体状况自我评估中的潜力。
Front Public Health. 2024 Feb 7;12:1348236. doi: 10.3389/fpubh.2024.1348236. eCollection 2024.
5
Combining Supervised and Unsupervised Learning Algorithms for Human Activity Recognition.结合监督和无监督学习算法进行人体活动识别。
Sensors (Basel). 2021 Sep 21;21(18):6309. doi: 10.3390/s21186309.