Ferreira Anderson A, Rocha Leonardo, Cunha Washington, Machado Ana Cláudia, Campos João Marcos, Jallais Gabriel, Viana Adriana C F, Tuler Elisa, Araújo Iago, Macul Víctor, Souza Neto Olívio, de Souza Júnior Antônio Pereira, de Pinho Souza Giordano, Pallone Joice Marques, Dumbá Soares Mariana Aparecida, Santos Welton Augusto, Gonçalves Marcos André
Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
Computer Science Department, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil.
Sci Rep. 2025 Aug 27;15(1):31660. doi: 10.1038/s41598-025-13560-9.
This study evaluates the ability of Large Language Models (LLMs) to summarize real-world dialogues between patients and the healthcare team of an e-health company that provides digital healthcare services, primarily communicating via WhatsApp. The team needs quick access to patient information to deliver accurate and personalized responses. Summarizing past messages is the approach examined here, aiming for concise, non-redundant, and truthful summaries that capture the main dialogue characteristics despite facing (real-world) noisy and informal content in an under-represented language - Portuguese. To do so, we collected an anonymized Portuguese dataset of WhatsApp messages exchanged between patients and the healthcare team. Dialogue quality was assessed for size, readability, and correctness before generating summaries with LLaMA3 and Qwen2 using specific prompts. Volunteers evaluated these summaries on coverage, relevance, redundancy, and veracity using a 5-point Likert scale. Our qualitative and quantitative experimental results indicate that LLMs can produce effective summaries of dialogues between patients and healthcare teams, even when faced with low-quality data in an underrepresented language. This is a surprising result due to the challenging scenario. Among the tested LLMs, LLaMA3 demonstrated a slight edge over QWen2 in coverage and veracity among the evaluated methods. Our results demonstrate a potential to build real-world practical services to assist healthcare professionals in responding to patient messages with agility, clarity, and cohesion, enhancing both communication efficiency and patient satisfaction. Ultimately, the advocated approach could significantly improve the landscape of online healthcare communication, particularly in resource-constrained settings like Brazil, where access to primary care is limited.
本研究评估了大语言模型(LLMs)总结患者与一家提供数字医疗服务的电子医疗公司的医疗团队之间真实对话的能力,该公司主要通过WhatsApp进行沟通。该团队需要快速获取患者信息,以便提供准确且个性化的回复。总结过往信息是此处所研究的方法,旨在生成简洁、无冗余且真实的总结,尽管面对(现实世界中)以葡萄牙语这种代表性不足的语言呈现的嘈杂且非正式的内容,仍能捕捉主要对话特征。为此,我们收集了患者与医疗团队之间交换的WhatsApp消息的匿名葡萄牙语数据集。在使用特定提示词通过LLaMA3和Qwen2生成总结之前,对对话质量进行了大小、可读性和正确性方面的评估。志愿者使用5点李克特量表对这些总结在覆盖范围、相关性、冗余度和真实性方面进行了评估。我们的定性和定量实验结果表明,即使面对代表性不足的语言中的低质量数据,大语言模型也能够生成患者与医疗团队之间对话的有效总结。鉴于这种具有挑战性的场景,这是一个令人惊讶的结果。在所测试的大语言模型中,在评估方法中,LLaMA3在覆盖范围和真实性方面比QWen2略胜一筹。我们的结果表明,有潜力构建现实世界中的实用服务,以帮助医疗专业人员灵活、清晰且连贯地回复患者消息,提高沟通效率并提升患者满意度。最终,所倡导的方法可以显著改善在线医疗沟通的局面,特别是在像巴西这样资源有限的环境中,那里获得初级医疗服务的机会有限。
Cochrane Database Syst Rev. 2024-8-27
J Health Organ Manag. 2025-6-30
J Am Med Inform Assoc. 2025-3-1
JBI Database System Rev Implement Rep. 2016-4
Health Care Sci. 2023-7-24
J Am Med Inform Assoc. 2024-5-20
JMIR Med Inform. 2023-11-28
Saudi Med J. 2017-12