Addlesee Angus, Eshghi Arash
Interaction Lab, Heriot-Watt University, Edinburgh, United Kingdom.
Alana AI, Edinburgh, United Kingdom.
Front Dement. 2024 Mar 12;3:1343052. doi: 10.3389/frdem.2024.1343052. eCollection 2024.
In spontaneous conversation, speakers seldom have a full plan of what they are going to say in advance: they need to conceptualise and plan as they articulate each word in turn. This often leads to long pauses mid-utterance. Listeners either wait out the pause, offer a possible completion, or respond with an incremental clarification request (iCR), intended to recover the rest of the truncated turn. The ability to generate iCRs in response to pauses is therefore important in building and everyday voice assistants (EVA) such as Amazon Alexa. This becomes crucial with people with dementia (PwDs) as a target user group since they are known to pause longer and more frequently, with current state-of-the-art EVAs interrupting them prematurely, leading to frustration and breakdown of the interaction. In this article, we first use two existing corpora of truncated utterances to establish the generation of clarification requests as an effective strategy for recovering from interruptions. We then proceed to report on, analyse, and release SLUICE-CR: a new corpus of 3,000 crowdsourced, human-produced iCRs, the first of its kind. We use this corpus to probe the incremental processing capability of a number of state-of-the-art large language models (LLMs) by evaluating (1) the quality of the model's generated iCRs in response to incomplete questions and (2) the ability of the said LLMs to respond correctly the users response to the generated iCR. For (1), our experiments show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes and only when prompted with example iCRs from our corpus. For (2), our results are in line with (1), that is, that larger LLMs interpret incremental clarificational exchanges more effectively. Overall, our results indicate that autoregressive language models (LMs) are, in principle, able to both understand and generate language incrementally and that LLMs can be configured to handle speech phenomena more commonly produced by PwDs, mitigating frustration with today's EVAs by improving their accessibility.
在自然对话中,说话者很少会提前有一个完整的要说内容的计划:他们需要在依次说出每个单词时进行概念化和规划。这常常导致话语中间出现长时间停顿。倾听者要么等待停顿结束,提供一个可能的补充内容,要么以递增式澄清请求(iCR)做出回应,旨在找回被截断话轮的其余部分。因此,在构建像亚马逊Alexa这样的日常语音助手(EVA)时,针对停顿生成iCR的能力很重要。当以患有痴呆症的人(PwD)作为目标用户群体时,这一点变得至关重要,因为众所周知他们停顿的时间更长且更频繁,而当前最先进的EVA会过早打断他们,导致沮丧情绪并使互动中断。在本文中,我们首先使用两个现有的截断话语语料库来确定澄清请求的生成是从打断中恢复的有效策略。然后我们继续报告、分析并发布SLUICE - CR:一个由3000个众包的、人工生成的iCR组成的新语料库,这是同类中的第一个。我们使用这个语料库通过评估(1)模型针对不完整问题生成的iCR的质量,以及(2)上述大语言模型正确回应用户对生成的iCR的回应的能力,来探究一些最先进的大语言模型(LLM)的递增处理能力。对于(1),我们的实验表明,仅在较大规模的LLM中,并且仅当用我们语料库中的示例iCR进行提示时,才会出现生成上下文合适的iCR的能力。对于(2),我们的结果与(1)一致,即较大的LLM能更有效地解释递增式澄清交流。总体而言,我们的结果表明自回归语言模型(LM)原则上能够逐步理解和生成语言,并且可以配置LLM来处理PwD更常见产生的语音现象,通过提高其可及性来减轻对当今EVA的沮丧情绪。