Liu Siru, McCoy Allison B, Wright Aileen P, Carew Babatunde, Genkins Julian Z, Huang Sean S, Peterson Josh F, Steitz Bryan, Wright Adam
medRxiv. 2023 Jul 16:2023.07.14.23292669. doi: 10.1101/2023.07.14.23292669.
This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.
Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate the fine-tuned models, we used ten representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.
The dataset consisted of a total of 499,794 pairs of patient messages and corresponding responses from the patient portal, with 5,000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.
Leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and primary care providers.
本研究旨在开发并评估经过微调的大语言模型,以生成对通过电子健康记录患者门户发送的患者消息的回复。
利用从一家大型学术医疗中心的患者门户中提取的消息和回复数据集,我们基于预训练的大语言模型(LLaMA - 65B)开发了一个模型(CLAIR - Short)。此外,我们使用OpenAI API将来自开源数据集的医生回复更新为具有信息性段落的格式,在强调同理心和专业性的同时提供患者教育。通过与该数据集相结合,我们进一步微调了我们的模型(CLAIR - Long)。为了评估经过微调的模型,我们使用初级保健中十个具有代表性的患者门户问题来生成回复。我们邀请初级保健医生审查我们的模型和ChatGPT生成的回复,并对它们的同理心、响应性、准确性和有用性进行评分。
该数据集总共包含499,794对患者消息和来自患者门户的相应回复,以及来自在线平台的5000条患者消息和ChatGPT更新后的回复。四位初级保健医生参与了调查。CLAIR - Short表现出能够生成与提供者回复相似的简洁回复的能力。与CLAIR - Short相比,CLAIR - Long的回复提供了更多的患者教育内容,并且评分与ChatGPT的回复相似,在响应性、同理心和准确性方面获得了积极评价,而在有用性方面获得了中性评价。
利用大语言模型生成对患者消息的回复在促进患者与初级保健提供者之间的沟通方面显示出巨大潜力。