Oliveira Juliana Damasio, Santos Henrique D P, Ulbrich Ana Helena D P S, Couto Julia Colleoni, Arocha Marcelo, Santos Joaquim, Costa Manuela Martins, Faccio Daniela, Tabalipa Fabio O, Nogueira Rodrigo F
Institute of A.I. in Healthcare, Porto Alegre, RS, Brazil.
Memed, Florianópolis, SC, Brazil.
Commun Med (Lond). 2025 Aug 28;5(1):376. doi: 10.1038/s43856-025-01091-3.
Clinical notes are a vital and detailed source of information about patient hospitalizations. However, the sheer volume and complexity of these notes make evaluation and summarization challenging. Nonetheless, summarizing clinical notes is essential for accurate and efficient clinical decision-making in patient care. Generative language models, particularly large language models such as GPT-4, offer a promising solution by creating coherent, contextually relevant text based on patterns learned from large datasets.
This study describes the development of a discharge summary system using large language models. By conducting an online survey and interviews, we gather feedback from end users, including physicians and patients, to ensure the system meets their practical needs and fits their experiences. Additionally, we develop a rating system to evaluate prompt effectiveness by comparing model-generated outputs with human assessments, which serve as benchmarks to evaluate the performance of the automated model.
Here we show that the model's ability to interpret diagnoses borders on humanlevel accuracy, demonstrating its potential to assist healthcare professionals in routine tasks such as generating discharge summaries.
This advancement underscores the potential of large language models in clinical settings and opens up possibilities for broader applications in healthcare documentation and decision-making support.
临床记录是患者住院信息的重要且详细的来源。然而,这些记录的数量庞大且复杂,使得评估和总结具有挑战性。尽管如此,总结临床记录对于患者护理中准确高效的临床决策至关重要。生成式语言模型,特别是像GPT-4这样的大型语言模型,通过基于从大型数据集中学习到的模式创建连贯、上下文相关的文本,提供了一个有前景的解决方案。
本研究描述了使用大型语言模型开发出院小结系统的过程。通过开展在线调查和访谈,我们收集了包括医生和患者在内的终端用户的反馈,以确保该系统满足他们的实际需求并符合他们的体验。此外,我们开发了一个评分系统,通过将模型生成的输出与人工评估进行比较来评估提示有效性,人工评估作为评估自动化模型性能的基准。
我们在此表明,该模型解释诊断的能力接近人类水平的准确性,证明了其在诸如生成出院小结等日常任务中协助医疗保健专业人员的潜力。
这一进展凸显了大型语言模型在临床环境中的潜力,并为医疗文档和决策支持中的更广泛应用开辟了可能性。