Institute for Physics 3 - Biophysics and Bernstein Center for Computational Neuroscience (BCCN), University of Göttingen, Göttingen, Germany.
Department of Psychology, University of Münster, Münster, Germany.
PLoS One. 2020 Dec 28;15(12):e0243829. doi: 10.1371/journal.pone.0243829. eCollection 2020.
Predicting other people's upcoming action is key to successful social interactions. Previous studies have started to disentangle the various sources of information that action observers exploit, including objects, movements, contextual cues and features regarding the acting person's identity. We here focus on the role of static and dynamic inter-object spatial relations that change during an action. We designed a virtual reality setup and tested recognition speed for ten different manipulation actions. Importantly, all objects had been abstracted by emulating them with cubes such that participants could not infer an action using object information. Instead, participants had to rely only on the limited information that comes from the changes in the spatial relations between the cubes. In spite of these constraints, participants were able to predict actions in, on average, less than 64% of the action's duration. Furthermore, we employed a computational model, the so-called enriched Semantic Event Chain (eSEC), which incorporates the information of different types of spatial relations: (a) objects' touching/untouching, (b) static spatial relations between objects and (c) dynamic spatial relations between objects during an action. Assuming the eSEC as an underlying model, we show, using information theoretical analysis, that humans mostly rely on a mixed-cue strategy when predicting actions. Machine-based action prediction is able to produce faster decisions based on individual cues. We argue that human strategy, though slower, may be particularly beneficial for prediction of natural and more complex actions with more variable or partial sources of information. Our findings contribute to the understanding of how individuals afford inferring observed actions' goals even before full goal accomplishment, and may open new avenues for building robots for conflict-free human-robot cooperation.
预测他人即将采取的行动是成功社交互动的关键。以前的研究已经开始梳理行动观察者利用的各种信息来源,包括物体、运动、上下文线索以及与行为者身份相关的特征。我们在这里关注的是在行动过程中变化的静态和动态物体间空间关系的作用。我们设计了一个虚拟现实设置,并测试了十种不同操作动作的识别速度。重要的是,所有物体都通过用立方体来模拟抽象化,以便参与者不能使用物体信息推断出动作。相反,参与者只能依赖于从立方体之间的空间关系变化中获得的有限信息。尽管存在这些限制,参与者平均能够在不到动作持续时间的 64%的时间内预测动作。此外,我们采用了一种计算模型,即所谓的丰富语义事件链(eSEC),它结合了不同类型空间关系的信息:(a) 物体的接触/不接触,(b) 物体之间的静态空间关系,(c) 物体在动作过程中的动态空间关系。假设 eSEC 作为一个潜在的模型,我们使用信息理论分析表明,人类在预测动作时主要依赖于混合线索策略。基于机器的动作预测能够根据单个线索更快地做出决策。我们认为,尽管人类的策略较慢,但对于预测具有更多可变或部分信息源的自然和更复杂的动作可能特别有益。我们的研究结果有助于理解个体如何在完全实现目标之前推断出观察到的动作的目标,并且可能为构建无冲突的人机协作机器人开辟新途径。