Department of Anesthesia and Pain Medicine, Johns Hopkins All Children's Hospital, 601 5th St South, Suite C725, St Petersburg, FL, 33701, USA.
Epidemiology and Biostatistics Shared Resource, Institute for Clinical and Translational Research, Johns Hopkins All Children's Hospital, St Petersburg, FL, USA.
J Med Syst. 2024 Aug 22;48(1):77. doi: 10.1007/s10916-024-02100-z.
Increased patient access to electronic medical records and resources has resulted in higher volumes of health-related questions posed to clinical staff, while physicians' rising clinical workloads have resulted in less time for comprehensive, thoughtful responses to patient questions. Artificial intelligence chatbots powered by large language models (LLMs) such as ChatGPT could help anesthesiologists efficiently respond to electronic patient inquiries, but their ability to do so is unclear. A cross-sectional exploratory survey-based study comprised of 100 anesthesia-related patient question/response sets based on two fictitious simple clinical scenarios was performed. Each question was answered by an independent board-certified anesthesiologist and ChatGPT (GPT-3.5 model, August 3, 2023 version). The responses were randomized and evaluated via survey by three blinded board-certified anesthesiologists for various quality and empathy measures. On a 5-point Likert scale, ChatGPT received similar overall quality ratings (4.2 vs. 4.1, p = .81) and significantly higher overall empathy ratings (3.7 vs. 3.4, p < .01) compared to the anesthesiologist. ChatGPT underperformed the anesthesiologist regarding rate of responses in agreement with scientific consensus (96.6% vs. 99.3%, p = .02) and possibility of harm (4.7% vs. 1.7%, p = .04), but performed similarly in other measures (percentage of responses with inappropriate/incorrect information (5.7% vs. 2.7%, p = .07) and missing information (10.0% vs. 7.0%, p = .19)). In conclusion, LLMs show great potential in healthcare, but additional improvement is needed to decrease the risk of patient harm and reduce the need for close physician oversight. Further research with more complex clinical scenarios, clinicians, and live patients is necessary to validate their role in healthcare.
患者对电子病历和资源的访问增加导致向临床工作人员提出的与健康相关的问题数量增加,而医生不断增加的临床工作量导致他们用于全面、深思熟虑地回答患者问题的时间减少。由大型语言模型(LLM)驱动的人工智能聊天机器人(如 ChatGPT)可以帮助麻醉师高效地回复电子患者咨询,但它们是否有能力做到这一点尚不清楚。进行了一项基于横断面探索性调查的研究,该研究由基于两个虚构简单临床场景的 100 个麻醉相关患者问题/回答集组成。每个问题都由一名独立的、经过 board 认证的麻醉师和 ChatGPT(GPT-3.5 模型,2023 年 8 月 3 日版本)回答。通过三名经过 board 认证的麻醉师进行的随机调查评估了这些回答,以评估各种质量和同理心指标。在 5 分制的 Likert 量表上,ChatGPT 获得了类似的整体质量评分(4.2 与 4.1,p=0.81)和明显更高的整体同理心评分(3.7 与 3.4,p<0.01)与麻醉师相比。ChatGPT 在与科学共识一致的回答率(96.6% 与 99.3%,p=0.02)和可能造成伤害的可能性(4.7% 与 1.7%,p=0.04)方面表现不如麻醉师,但在其他指标上表现相似(回答中存在不适当/不正确信息的百分比(5.7% 与 2.7%,p=0.07)和缺少信息的百分比(10.0% 与 7.0%,p=0.19))。总之,大型语言模型在医疗保健领域具有巨大潜力,但需要进一步改进,以降低患者伤害的风险并减少对医生密切监督的需求。需要进行更多具有复杂临床场景、临床医生和真实患者的研究,以验证它们在医疗保健中的作用。