动作与交互的在线定位及预测

Online Localization and Prediction of Actions and Interactions.

作者信息

Soomro Khurram, Idrees Haroon, Shah Mubarak

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):459-472. doi: 10.1109/TPAMI.2018.2797266. Epub 2018 Jan 23.

DOI:10.1109/TPAMI.2018.2797266

Abstract

This paper proposes a person-centric and online approach to the challenging problem of localization and prediction of actions and interactions in videos. Typically, localization or recognition is performed in an offline manner where all the frames in the video are processed together. This prevents timely localization and prediction of actions and interactions - an important consideration for many tasks including surveillance and human-machine interaction. In our approach, we estimate human poses at each frame and train discriminative appearance models using the superpixels inside the pose bounding boxes. Since the pose estimation per frame is inherently noisy, the conditional probability of pose hypotheses at current time-step (frame) is computed using pose estimations in the current frame and their consistency with poses in the previous frames. Next, both the superpixel and pose-based foreground likelihoods are used to infer the location of actors at each time through a Conditional Random Field enforcing spatio-temporal smoothness in color, optical flow, motion boundaries and edges among superpixels. The issue of visual drift is handled by updating the appearance models, and refining poses using motion smoothness on joint locations, in an online manner. For online prediction of action/interaction confidences, we propose an approach based on Structural SVM that operates on short video segments, and is trained with the objective that confidence of an action or interaction increases as time passes in a positive training clip. Lastly, we quantify the performance of both detection and prediction together, and analyze how the prediction accuracy varies as a time function of observed action/interaction at different levels of detection performance. Our experiments on several datasets suggest that despite using only a few frames to localize actions/interactions at each time instant, we are able to obtain competitive results to state-of-the-art offline methods.

摘要

本文针对视频中动作和交互的定位与预测这一具有挑战性的问题，提出了一种以人物为中心的在线方法。通常，定位或识别是以离线方式进行的，即对视频中的所有帧一起进行处理。这就阻碍了对动作和交互的及时定位与预测，而这对于包括监控和人机交互在内的许多任务来说都是一个重要的考量因素。在我们的方法中，我们在每一帧估计人体姿态，并使用姿态边界框内的超像素来训练判别性外观模型。由于每帧的姿态估计本身存在噪声，因此使用当前帧中的姿态估计及其与前一帧姿态的一致性来计算当前时间步（帧）姿态假设的条件概率。接下来，超像素和基于姿态的前景似然性都通过条件随机场用于推断每个时刻演员的位置，该条件随机场在超像素之间的颜色、光流、运动边界和边缘方面强制实现时空平滑性。通过在线更新外观模型，并利用关节位置的运动平滑性来细化姿态，从而处理视觉漂移问题。对于动作/交互置信度的在线预测，我们提出了一种基于结构化支持向量机的方法，该方法在短视频片段上运行，并以这样的目标进行训练：在正向训练片段中，随着时间推移，动作或交互的置信度会增加。最后，我们一起量化检测和预测的性能，并分析在不同检测性能水平下，预测准确率如何随观察到的动作/交互的时间函数而变化。我们在几个数据集上的实验表明，尽管在每个时刻仅使用少数帧来定位动作/交互，但我们能够获得与当前最先进的离线方法相竞争的结果。

相似文献

Online Localization and Prediction of Actions and Interactions.动作与交互的在线定位及预测

IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):459-472. doi: 10.1109/TPAMI.2018.2797266. Epub 2018 Jan 23.

Real-time multiple spatiotemporal action localization and prediction approach using deep learning.基于深度学习的实时多时空动作定位与预测方法。

Neural Netw. 2020 Aug;128:331-344. doi: 10.1016/j.neunet.2020.05.017. Epub 2020 May 19.

Animated pose templates for modeling and detecting human actions.用于建模和检测人体动作的动画姿势模板。

IEEE Trans Pattern Anal Mach Intell. 2014 Mar;36(3):436-52. doi: 10.1109/TPAMI.2013.144.

Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.在自我中心视频中共同识别、定位和总结动作

IEEE Trans Image Process. 2021;30:4330-4340. doi: 10.1109/TIP.2021.3070732. Epub 2021 Apr 16.

Online action proposal generation using spatio-temporal attention network.基于时空注意力网络的在线动作建议生成。

Neural Netw. 2022 Sep;153:518-529. doi: 10.1016/j.neunet.2022.06.032. Epub 2022 Jun 30.

Fast Online Video Pose Estimation by Dynamic Bayesian Modeling of Mode Transitions.基于模态转换的动态贝叶斯建模的快速在线视频姿态估计。

IEEE Trans Cybern. 2021 Jan;51(1):2-15. doi: 10.1109/TCYB.2018.2884216. Epub 2020 Dec 22.

Constrained Superpixel Tracking.约束型超像素跟踪。

IEEE Trans Cybern. 2018 Mar;48(3):1030-1041. doi: 10.1109/TCYB.2017.2675910. Epub 2017 Mar 9.

Confidence-Guided Self Refinement for Action Prediction in Untrimmed Videos.用于未修剪视频动作预测的置信度引导自精炼

IEEE Trans Image Process. 2020 Apr 17. doi: 10.1109/TIP.2020.2987425.

Unsupervised Action Proposals Using Support Vector Classifiers for Online Video Processing.基于支持向量分类器的无监督动作建议在在线视频处理中的应用。

Sensors (Basel). 2020 May 22;20(10):2953. doi: 10.3390/s20102953.

Transfer Latent SVM for Joint Recognition and Localization of Actions in Videos.基于潜在支持向量机的视频动作联合识别与定位

IEEE Trans Cybern. 2016 Nov;46(11):2596-2608. doi: 10.1109/TCYB.2015.2482970. Epub 2015 Oct 15.

引用本文的文献

A dynamic spatiotemporal model for fall warning and protection.用于跌倒预警和防护的动态时空模型。

Med Biol Eng Comput. 2024 Apr;62(4):1061-1076. doi: 10.1007/s11517-023-02999-5. Epub 2023 Dec 23.

动作与交互的在线定位及预测

Online Localization and Prediction of Actions and Interactions.

作者信息

Soomro Khurram, Idrees Haroon, Shah Mubarak

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):459-472. doi: 10.1109/TPAMI.2018.2797266. Epub 2018 Jan 23.

DOI:10.1109/TPAMI.2018.2797266

PMID:29994600

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

动作与交互的在线定位及预测

Online Localization and Prediction of Actions and Interactions.

作者信息

出版信息

相似文献

引用本文的文献

动作与交互的在线定位及预测

Online Localization and Prediction of Actions and Interactions.

作者信息

出版信息

相似文献

引用本文的文献