对使用大语言模型进行患者对话摘要的综合定性分析，该分析应用于嘈杂、非正式、非英语的真实世界数据。

This study evaluates the ability of Large Language Models (LLMs) to summarize real-world dialogues between patients and the healthcare team of an e-health company that provides digital healthcare services, primarily communicating via WhatsApp. The team needs quick access to patient information to deliver accurate and personalized responses. Summarizing past messages is the approach examined here, aiming for concise, non-redundant, and truthful summaries that capture the main dialogue characteristics despite facing (real-world) noisy and informal content in an under-represented language - Portuguese. To do so, we collected an anonymized Portuguese dataset of WhatsApp messages exchanged between patients and the healthcare team. Dialogue quality was assessed for size, readability, and correctness before generating summaries with LLaMA3 and Qwen2 using specific prompts. Volunteers evaluated these summaries on coverage, relevance, redundancy, and veracity using a 5-point Likert scale. Our qualitative and quantitative experimental results indicate that LLMs can produce effective summaries of dialogues between patients and healthcare teams, even when faced with low-quality data in an underrepresented language. This is a surprising result due to the challenging scenario. Among the tested LLMs, LLaMA3 demonstrated a slight edge over QWen2 in coverage and veracity among the evaluated methods. Our results demonstrate a potential to build real-world practical services to assist healthcare professionals in responding to patient messages with agility, clarity, and cohesion, enhancing both communication efficiency and patient satisfaction. Ultimately, the advocated approach could significantly improve the landscape of online healthcare communication, particularly in resource-constrained settings like Brazil, where access to primary care is limited.

本研究评估了大语言模型（LLMs）总结患者与一家提供数字医疗服务的电子医疗公司的医疗团队之间真实对话的能力，该公司主要通过WhatsApp进行沟通。该团队需要快速获取患者信息，以便提供准确且个性化的回复。总结过往信息是此处所研究的方法，旨在生成简洁、无冗余且真实的总结，尽管面对（现实世界中）以葡萄牙语这种代表性不足的语言呈现的嘈杂且非正式的内容，仍能捕捉主要对话特征。为此，我们收集了患者与医疗团队之间交换的WhatsApp消息的匿名葡萄牙语数据集。在使用特定提示词通过LLaMA3和Qwen2生成总结之前，对对话质量进行了大小、可读性和正确性方面的评估。志愿者使用5点李克特量表对这些总结在覆盖范围、相关性、冗余度和真实性方面进行了评估。我们的定性和定量实验结果表明，即使面对代表性不足的语言中的低质量数据，大语言模型也能够生成患者与医疗团队之间对话的有效总结。鉴于这种具有挑战性的场景，这是一个令人惊讶的结果。在所测试的大语言模型中，在评估方法中，LLaMA3在覆盖范围和真实性方面比QWen2略胜一筹。我们的结果表明，有潜力构建现实世界中的实用服务，以帮助医疗专业人员灵活、清晰且连贯地回复患者消息，提高沟通效率并提升患者满意度。最终，所倡导的方法可以显著改善在线医疗沟通的局面，特别是在像巴西这样资源有限的环境中，那里获得初级医疗服务的机会有限。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

A comprehensive qualitative analysis of patient dialogue summarization using large language models applied to noisy, informal, non-English real-world data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具