Lee Chanseo, Kumar Sonu, Vogt Kimon A, Munshi Muhammad, Tallapudi Panindhra, Vogt Antonia, Awad Hamzeh, Khan Wasim
Sporo Health, Boston, MA, USA.
Department of Anesthesiology, Yale School of Medicine, New Haven, CT, 06520, USA.
Sci Rep. 2025 Jul 29;15(1):27619. doi: 10.1038/s41598-025-10451-x.
The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic's unique linguistic complexities-morphological richness, syntactic variations, and diglossia-present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.
医疗技术对多语言能力的需求日益增加,凸显了对能够处理临床文档中使用较少语言(如阿拉伯语)的人工智能解决方案的迫切需求。阿拉伯语独特的语言复杂性——形态丰富、句法多样和双语现象——给基础大语言模型(LLM)带来了重大挑战,尤其是在医学摘要等特定领域任务中。本研究介绍了AraSum,这是一个使用新颖的知识蒸馏框架构建的特定领域人工智能代理,该框架将大型多语言LLM转换为轻量级、任务优化的小语言模型(SLM)。利用阿拉伯语医学对话的合成数据集,AraSum在包括BLEU和ROUGE分数在内的关键评估指标上,展示了优于基础阿拉伯语LLM JAIS - 30B 的性能。在阿拉伯语评估者对准确性、全面性和临床实用性的评估中,AraSum也优于JAIS,同时在通过修改后的PDQI - 9量表测量的语言性能方面保持可比。除了准确性,AraSum以显著更低的计算和环境成本实现了这些结果,证明了在低资源环境中为特定领域任务部署资源高效的人工智能模型的可行性。这项工作强调了基于SLM的代理架构在推进多语言医疗保健、鼓励可持续人工智能以及促进医疗服务获取公平性方面的潜力。
J Am Med Inform Assoc. 2025-3-1
Front Digit Health. 2024-11-7
J Med Internet Res. 2024-11-14
NPJ Digit Med. 2020-9-14