Henlein Alexander, Gopinath Anju, Krishnaswamy Nikhil, Mehler Alexander, Pustejovsky James
Text Technology Lab, Faculty of Computer Science and Mathematics, Institute of Computer Science, Goethe University Frankfurt, Frankfurt, Germany.
Situated Grounding and Natural Language Lab, Department of Computer Science, Colorado State University, Fort Collins, CO, United States.
Front Artif Intell. 2023 Jan 30;6:1084740. doi: 10.3389/frai.2023.1084740. eCollection 2023.
While affordance detection and Human-Object interaction (HOI) detection tasks are related, the theoretical foundation of affordances makes it clear that the two are distinct. In particular, researchers in affordances make distinctions between J. J. Gibson's traditional definition of an affordance, "the action possibilities" of the object within the environment, and the definition of a affordance, or one defined by conventionalized purpose or use. We augment the HICO-DET dataset with annotations for Gibsonian and telic affordances and a subset of the dataset with annotations for the orientation of the humans and objects involved. We then train an adapted Human-Object Interaction (HOI) model and evaluate a pre-trained viewpoint estimation system on this augmented dataset. Our model, AffordanceUPT, is based on a two-stage adaptation of the Unary-Pairwise Transformer (UPT), which we modularize to make affordance detection independent of object detection. Our approach exhibits generalization to new objects and actions, can effectively make the Gibsonian/telic distinction, and shows that this distinction is correlated with features in the data that are not captured by the HOI annotations of the HICO-DET dataset.
虽然可供性检测和人与物体交互(HOI)检测任务相关,但可供性的理论基础明确表明两者是不同的。特别是,可供性研究人员区分了J. J. 吉布森对可供性的传统定义,即物体在环境中的“行动可能性”,以及一种可供性的定义,或由常规目的或用途定义的可供性。我们用吉布森式和目的论可供性的注释扩充了HICO-DET数据集,并对数据集中涉及的人和物体的方向注释进行了子集扩充。然后,我们训练了一个经过改编的人与物体交互(HOI)模型,并在这个扩充后的数据集上评估一个预训练的视点估计系统。我们的模型AffordanceUPT基于一元-成对变压器(UPT)的两阶段改编,我们将其模块化以使可供性检测独立于物体检测。我们的方法对新物体和新动作具有泛化能力,能够有效地区分吉布森式/目的论可供性,并表明这种区分与HICO-DET数据集的HOI注释未捕捉到的数据特征相关。