Mitev Nikolina, Renner Patrick, Pfeiffer Thies, Staudte Maria
CITEC, Universität des Saarlandes, Campus C7.4 (2.04), Saarbrücken, 66123, Germany.
CITEC, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany.
Cogn Res Princ Implic. 2018 Dec 29;3(1):51. doi: 10.1186/s41235-018-0148-x.
Referential success is crucial for collaborative task-solving in shared environments. In face-to-face interactions, humans, therefore, exploit speech, gesture, and gaze to identify a specific object. We investigate if and how the gaze behavior of a human interaction partner can be used by a gaze-aware assistance system to improve referential success. Specifically, our system describes objects in the real world to a human listener using on-the-fly speech generation. It continuously interprets listener gaze and implements alternative strategies to react to this implicit feedback. We used this system to investigate an optimal strategy for task performance: providing an unambiguous, longer instruction right from the beginning, or starting with a shorter, yet ambiguous instruction. Further, the system provides gaze-driven feedback, which could be either underspecified ("No, not that one!") or contrastive ("Further left!"). As expected, our results show that ambiguous instructions followed by underspecified feedback are not beneficial for task performance, whereas contrastive feedback results in faster interactions. Interestingly, this approach even outperforms unambiguous instructions (manipulation between subjects). However, when the system alternates between underspecified and contrastive feedback to initially ambiguous descriptions in an interleaved manner (within subjects), task performance is similar for both approaches. This suggests that listeners engage more intensely with the system when they can expect it to be cooperative. This, rather than the actual informativity of the spoken feedback, may determine the efficiency of information uptake and performance.
在共享环境中进行协作任务解决时,指称成功至关重要。因此,在面对面互动中,人类会利用言语、手势和目光来识别特定物体。我们研究了具有目光感知能力的辅助系统是否以及如何利用人类互动伙伴的目光行为来提高指称成功率。具体而言,我们的系统通过即时语音生成向人类听众描述现实世界中的物体。它持续解读听众的目光,并实施替代策略以对这种隐含反馈做出反应。我们使用这个系统来研究任务执行的最佳策略:从一开始就提供明确、较长的指令,还是以较短但模糊的指令开始。此外,该系统提供目光驱动的反馈,这种反馈可以是未充分说明的(“不,不是那个!”)或对比性的(“再往左!”)。正如预期的那样,我们的结果表明,接着未充分说明反馈的模糊指令对任务执行没有益处,而对比性反馈会带来更快的互动。有趣的是,这种方法甚至优于明确的指令(受试者间的操作)。然而,当系统以交错方式(受试者内)在对最初模糊描述的未充分说明反馈和对比性反馈之间交替时,两种方法的任务执行情况相似。这表明,当听众期望系统具有协作性时,他们会更积极地与系统互动。这一点,而非口头反馈的实际信息量,可能决定了信息获取的效率和任务执行情况。