ETH Zurich, Sternwartstrasse 7, Zurich CH-8092, Switzerland.
IEEE Trans Pattern Anal Mach Intell. 2012 Mar;34(3):601-14. doi: 10.1109/TPAMI.2011.158.
We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our approach is human-centric: We first localize a human in the image and then determine the object relevant for the action and its spatial relation with the human. The model is learned automatically from a set of still images annotated only with the action label. Our approach relies on a human detector to initialize the model learning. For robustness to various degrees of visibility, we build a detector that learns to combine a set of existing part detectors. Starting from humans detected in a set of images depicting the action, our approach determines the action object and its spatial relation to the human. Its final output is a probabilistic model of the human-object interaction, i.e., the spatial relation between the human and the object. We present an extensive experimental evaluation on the sports action data set from [1], the PASCAL Action 2010 data set [2], and a new human-object interaction data set.
我们介绍了一种弱监督的方法,用于学习人类行为,这些行为被建模为人类与物体之间的相互作用。我们的方法以人为中心:我们首先在图像中定位一个人,然后确定与动作相关的物体及其与人类的空间关系。该模型是从一组仅标注有动作标签的静态图像中自动学习得到的。我们的方法依赖于一个人类探测器来初始化模型学习。为了对不同程度的可见性具有鲁棒性,我们构建了一个学习组合一组现有部件探测器的探测器。从一组表示动作的图像中检测到的人类开始,我们的方法确定动作对象及其与人类的空间关系。其最终输出是人类-对象交互的概率模型,即人与物体之间的空间关系。我们在[1]中的运动动作数据集、[2]中的 PASCAL 动作 2010 数据集以及一个新的人类-对象交互数据集上进行了广泛的实验评估。