Greco Claudio, Bagade Diksha, Le Dieu-Thu, Bernardi Raffaella
CIMeC, University of Trento, Rovereto, TN, Italy.
Amazon Alexa AI, Berlin, Germany.
Front Artif Intell. 2023 Mar 9;6:1017204. doi: 10.3389/frai.2023.1017204. eCollection 2023.
Communication is a dynamic process through which interlocutors adapt to each other. In the development of conversational agents, this core aspect has been put aside for several years since the main challenge was to obtain conversational neural models able to produce utterances and dialogues that at least at the surface level are human-like. Now that this milestone has been achieved, the importance of paying attention to the dynamic and adaptive interactive aspects of language has been advocated in several position papers. In this paper, we focus on how a Speaker adapts to an interlocutor with different background knowledge. Our models undergo a pre-training phase, through which they acquire grounded knowledge by learning to describe an image, and an adaptive phase through which a Speaker and a Listener play a repeated reference game. Using a similar setting, previous studies focus on how conversational models create new conventions; we are interested, instead, in studying whether the Speaker learns from the Listener's mistakes to adapt to his background knowledge. We evaluate models based on Rational Speech Act (RSA), a likelihood loss, and a combination of the two. We show that RSA could indeed work as a backbone to drive the Speaker toward the Listener: in the combined model, apart from the improved Listener's accuracy, the language generated by the Speaker features the changes that signal adaptation to the Listener's background knowledge. Specifically, captions to unknown object categories contain more adjectives and less direct reference to the unknown objects.
交流是一个动态过程,在此过程中对话者相互适应。在对话代理的发展过程中,这一核心方面在数年里都被搁置一旁,因为主要挑战是获得能够生成至少在表面上类似人类的话语和对话的对话神经模型。既然这一里程碑已经达成,若干立场文件都主张了关注语言动态和适应性交互方面的重要性。在本文中,我们聚焦于说话者如何适应具有不同背景知识的对话者。我们的模型经历一个预训练阶段,通过该阶段它们通过学习描述图像来获取有根据的知识,以及一个适应阶段,通过该阶段说话者和倾听者进行重复的指称游戏。使用类似的设置,先前的研究聚焦于对话模型如何创造新的惯例;相反,我们感兴趣的是研究说话者是否从倾听者的错误中学习以适应其背景知识。我们基于理性言语行为(RSA)、似然损失以及两者的组合来评估模型。我们表明RSA确实可以作为驱使说话者趋向倾听者的主干:在组合模型中,除了提高倾听者的准确率之外,说话者生成的语言具有表明适应倾听者背景知识的变化。具体而言,针对未知对象类别的字幕包含更多形容词,并且对未知对象的直接指称更少。