Imaezue Gerald C, Marampelly Harikrishna
Department of Communication Sciences and Disorders, University of South Florida, Tampa.
Department of Computer Science and Engineering, University of South Florida, Tampa.
J Speech Lang Hear Res. 2025 Jul 8;68(7):3322-3336. doi: 10.1044/2025_JSLHR-25-00003. Epub 2025 Jun 13.
Development of aphasia therapies is limited by clinician shortages, patient recruitment challenges, and funding constraints. To address these barriers, we introduce (ABCD), a novel method for simulating goal-driven natural spoken dialogues between two conversational artificial intelligence (AI) agents-AI clinician (Re-Agent) and AI patient (AI-Aphasic), which vocally mimics aphasic errors. Using ABCD, we simulated response elaboration training between both agents with stimuli varying in semantic constraint (high via pictures, low via topics). Rather than resource-intensive fine-tuning, we leveraged prompt engineering, chain-of-thought (CoT) and zero-shot techniques for rapid, cost-effective agent development, and piloting.
Built on OpenAI's GPT-4o as the foundational large language model, Re-Agent and AI-Aphasic were supplemented with external speech-to-text and naturalistic text-to-speech application programming interfaces to create a multiturn, dynamic dialogue system in English. We used it to evaluate Re-Agent's conversational performance across four experimental conditions (CoT + picture, CoT + topic, zero-shot + picture, zero-shot + topic) and aphasic error at two levels: word and discourse errors. Re-Agent's performance was measured using three discourse metrics: global coherence, local coherence, and grammaticality of utterances.
Overall, Re-Agent performed accurately in all the discourse metrics across all levels of semantic parameter, prompting technique and aphasic error. The results also indicated that well-crafted zero-shot prompts induce more direct and logically related responses that are robust to adversarial aphasic speech inputs, whereas CoT might lead to responses that slightly lose local coherence due to additional complex reasoning chains.
ABCD represents a foundational computational approach to accelerate the innovation and preclinical testing of conversational AI partners for speech-language therapy. ABCD circumvents the barriers of collecting diverse errorful speech samples for clinical conversational AI fine-tuning. As AI systems-including large language models and speech technologies-advance rapidly, ABCD will scale accordingly, further enhancing its potential for clinical integration.
失语症治疗方法的发展受到临床医生短缺、患者招募挑战和资金限制的制约。为了克服这些障碍,我们引入了一种名为ABCD的新方法,用于模拟两个对话式人工智能(AI)代理——AI临床医生(重新代理)和AI患者(AI失语症患者)之间目标驱动的自然口语对话,后者会通过语音模仿失语症错误。使用ABCD,我们在语义约束程度不同(通过图片为高,通过主题为低)的刺激下,模拟了两个代理之间的反应细化训练。我们没有采用资源密集型的微调,而是利用提示工程、思维链(CoT)和零样本技术来快速、经济高效地开发和试点代理。
基于OpenAI的GPT-4o作为基础大语言模型,重新代理和AI失语症患者通过外部语音转文本和自然主义文本转语音应用程序编程接口进行补充,以创建一个多轮、动态的英语对话系统。我们用它来评估重新代理在四种实验条件(CoT + 图片、CoT + 主题、零样本 + 图片、零样本 + 主题)下的对话表现,以及在单词和语篇错误两个层面的失语症错误。使用三个语篇指标来衡量重新代理的表现:全局连贯性、局部连贯性和话语语法性。
总体而言,重新代理在所有语义参数、提示技术和失语症错误水平的所有语篇指标上都表现准确。结果还表明,精心设计的零样本提示会引发更直接且逻辑相关的反应,这些反应对对抗性失语症语音输入具有鲁棒性,而CoT可能会导致反应由于额外的复杂推理链而略微失去局部连贯性。
ABCD代表了一种基础计算方法,可加速用于言语治疗的对话式AI伙伴的创新和临床前测试。ABCD规避了为临床对话式AI微调收集各种错误语音样本的障碍。随着包括大语言模型和语音技术在内的AI系统迅速发展,ABCD将相应扩展,进一步增强其临床整合潜力。