用于联合事件分割、识别和目标定位的 4D 人机交互建模。

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1165-1179. doi: 10.1109/TPAMI.2016.2574712. Epub 2016 Jun 1.

DOI:10.1109/TPAMI.2016.2574712

Abstract

In this paper, we present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization. The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions with coherent objects. The 4DHOI model is a hierarchical spatial-temporal graph representation which can be used for inferring scene functionality and object affordance. The graph structures and parameters are learned using an ordered expectation maximization algorithm which mines the spatial-temporal structures of events from RGB-D video samples. Given an input RGB-D video, the inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization. We collected a large multiview RGB-D event dataset which contains 3,815 video sequences and 383,036 RGB-D frames captured by three RGB-D cameras. The experimental results on three challenging datasets demonstrate the strength of the proposed method.

摘要

本文提出了一种 4D 人机交互 (4DHOI) 模型，用于联合解决三个视觉任务：i）从视频序列中分割事件，ii）识别和解析事件，以及 iii）定位上下文对象。4DHOI 模型表示涉及人机交互的日常事件中的几何、时间和语义关系。在 3D 空间中，通过语义共现和几何兼容性来模拟人体姿势和上下文对象的相互作用。在时间轴上，交互表示为具有连贯对象的原子事件转换序列。4DHOI 模型是一种分层时空图表示，可以用于推断场景功能和对象可及性。图结构和参数使用有序期望最大化算法学习，该算法从 RGB-D 视频样本中挖掘事件的时空结构。给定输入的 RGB-D 视频，通过动态规划波束搜索算法进行推断，该算法同时执行事件分割、识别和对象定位。我们收集了一个大型多视图 RGB-D 事件数据集，其中包含三个 RGB-D 摄像机拍摄的 3,815 个视频序列和 383,036 个 RGB-D 帧。在三个具有挑战性的数据集上的实验结果证明了所提出方法的优势。

相似文献

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization.

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1165-1179. doi: 10.1109/TPAMI.2016.2574712. Epub 2016 Jun 1.

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything.

Sensors (Basel). 2024 Apr 30;24(9):2889. doi: 10.3390/s24092889.

Image Representations with Spatial Object-to-Object Relations for RGB-D Scene Recognition.

IEEE Trans Image Process. 2019 Aug 13. doi: 10.1109/TIP.2019.2933728.

Video Object Discovery and Co-Segmentation with Extremely Weak Supervision.

IEEE Trans Pattern Anal Mach Intell. 2017 Oct;39(10):2074-2088. doi: 10.1109/TPAMI.2016.2612187. Epub 2016 Oct 26.

A neuromorphic dataset for tabletop object segmentation in indoor cluttered environment.

Sci Data. 2024 Jan 25;11(1):127. doi: 10.1038/s41597-024-02920-1.

A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling.

IEEE Trans Pattern Anal Mach Intell. 2018 Sep;40(9):2051-2065. doi: 10.1109/TPAMI.2017.2747134. Epub 2017 Aug 30.

Hierarchical Context Modeling for Video Event Recognition.

IEEE Trans Pattern Anal Mach Intell. 2017 Sep;39(9):1770-1782. doi: 10.1109/TPAMI.2016.2616308. Epub 2016 Oct 11.

RGB-D Object SLAM Using Quadrics for Indoor Environments.

Sensors (Basel). 2020 Sep 9;20(18):5150. doi: 10.3390/s20185150.

Joint Video Object Discovery and Segmentation by Coupled Dynamic Markov Networks.

IEEE Trans Image Process. 2018 Dec;27(12):5840-5853. doi: 10.1109/TIP.2018.2859622. Epub 2018 Jul 30.

Monocular visual scene understanding: understanding multi-object traffic scenes.

IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):882-97. doi: 10.1109/TPAMI.2012.174.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于联合事件分割、识别和目标定位的 4D 人机交互建模。

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献