Kim Da-Young, Lym Hyo Jeong, Lee Hanna, Lee Ye Jun, Kim Juhyun, Kim Min-Gyu, Baek Yunju
Human-Robot Interaction Center, Korea Institute of Robotics & Technology Convergence (KIRO), Pohang 37553, Republic of Korea.
Department of Information Convergence Engineering, Pusan National University, Busan 46241, Republic of Korea.
Sensors (Basel). 2024 Dec 12;24(24):7939. doi: 10.3390/s24247939.
Dialogue systems must understand children's utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child-robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children's intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children's speech to enhance its intention reasoning performance in verbal interactions with children. To this end, we propose a fine-tuning methodology that utilizes the LLM-human judgment discrepancy and interactive response data. The former data represent cases in which the LLM and human judgments of the contextual appropriateness of a child's answer to a robot's question diverge. The latter data involve robot responses suitable for children's utterance intentions, generated by the LLM. We developed a fine-tuned dialogue system using these datasets to achieve human-like interpretations of children's utterances and to respond adaptively. Our system was evaluated through human assessment using the Robotic Social Attributes Scale (RoSAS) and Sensibleness and Specificity Average (SSA) metrics. Consequently, it supports the effective interpretation of children's utterance intentions and enables natural verbal interactions, even in cases with syntactic incompleteness and mispronunciations.
对话系统必须通过考虑儿童独特的语言特征,如句法不完整、发音不准确和创造性表达,来理解他们的话语意图,以便在儿童与机器人的互动中实现自然的对话交流。即使是用于语言理解和语境感知的最先进的大语言模型(LLMs),由于儿童话语的独特特征,也无法像人类一样准确理解儿童的意图。基于大语言模型的对话系统应该学习人类理解儿童言语的方式,以提高其在与儿童的言语互动中的意图推理性能。为此,我们提出了一种微调方法,该方法利用大语言模型与人类判断的差异以及交互式响应数据。前者的数据表示大语言模型与人类对儿童对机器人问题的回答的语境适当性判断存在分歧的情况。后者的数据涉及由大语言模型生成的适合儿童话语意图的机器人响应。我们使用这些数据集开发了一个经过微调的对话系统,以实现对儿童话语的类人解释并进行自适应响应。我们的系统通过使用机器人社会属性量表(RoSAS)和合理性与特异性平均值(SSA)指标的人类评估进行了评估。因此,它支持对儿童话语意图的有效解释,即使在存在句法不完整和发音错误的情况下,也能实现自然的言语互动。